Methods for tagging dna-encoded libraries

ABSTRACT

The present invention relates to oligonucleotide-encoded libraries and methods of tagging such libraries. In particular, the methods and oligonucleotides can include one or more 2′-substituted nucleotides, such as 2′-O-methyl or 2′-fluoro nucleotides, and other conditions or reagents to enhance enzyme ligation or one or more chemical functionalities to support chemical ligation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit of U.S. Provisional Application Nos.61/531,820, filed Sep. 7, 2011, and 61/536,929, filed Sep. 20, 2011,each of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

In general, this invention relates to DNA-encoded libraries of compoundsand methods of using and creating such libraries. The invention alsorelates to compositions for use in such libraries.

DNA-encoded combinatorial libraries afford many benefits for drugdiscovery. These libraries can provide a large number of diversecompounds that can be rapidly screened and interrogated. To furtherincrease complexity, various steps of the discovery process can beprogrammed and automated. These steps include the use of multi-step,split-and-pool synthesis to add building blocks to atomic or polyatomicscaffolds and the use of enzymatic and/or chemical ligation to add DNAtags that encode both the synthetic steps and the building blocks.

Despite these benefits, numerous issues can arise when very large orcomplex libraries must be synthesized and deconvoluted. As the size ofthe library increases, improved methods may be needed to provide highyields of tag ligation. To create libraries tinder diverse reactionconditions, stable ligated nucleotide constructs would be beneficial,such as constructs that are stable under conditions of high pH andelevated temperature. To simplify deconvolution of tags, the sequence ofthe tags could be recognized by DNA- or RNA-dependent polymerases, suchthat tag population demographics can be determined by template-dependentpolymerization and sequence determination. Difficulties may arise whencreating a library having all of these beneficial attributes.Accordingly, there exists a need for improved, more robust methods ofscreening and identifying small compounds in DNA-encoded libraries.

SUMMARY OF THE INVENTION

The present invention features methods of creating libraries, where themethod includes one or more conditions that improve single-strandedligation of tags, and compositions for use in creating libraries.Exemplary conditions include the use of one or more 2′-substituted baseswithin the tags, such as 2′-O-methyl or 2′-fluoro; the use of tags ofparticular length; the use of one or more enzymes; optionally, theinclusion of error-recognition capabilities in the tag design; and/orthe use of one or more agents during ligation.

Accordingly, the invention features a method of tagging a first libraryincluding an oligonucleotide-encoded chemical entity, the methodincluding: (i) providing a headpiece having a first functional group anda second functional group, where the headpiece includes at least one2′-substituted nucleotide; (ii) binding the first functional group ofthe headpiece to a first component of the chemical entity, where theheadpiece is directly connected to the first component or the headpieceis indirectly connected to the first component by a bifunctional linker(e.g., a poly ethylene glycol linker or —(CII₂CII₂O)_(n)CII₂CII₂—, wheren is an integer from 1 to 50); and (iii) binding the second functionalgroup of the headpiece to a first building block tag to form a complex,where steps (ii) and (iii) can be performed in any order and where thefirst building block tag encodes for the binding reaction of step (ii),thereby providing a tagged library.

In some embodiments, the headpiece includes a 2′-substituted nucleotideat one or more of the 5′-terminus, the 3′-terminus, or the internalposition of the headpiece. In particular embodiments, the headpieceincludes the 2′-substituted nucleotide and the second functional groupat the 5′-terminus or at the 3′-terminus.

In other embodiments, the first building block tag includes at least one(e.g., at least two, three, four, five, or more) 2′-substitutednucleotides. In particular embodiments, the first building block tagincludes a 2′-substituted nucleotide at one or more of the 5′-terminus,the 3′-terminus, or the internal position of the first building blocktag (e.g., a 2′-O-methyl nucleotide or a 2′-fluoro nucleotide at both ofthe 5′- and 3′-termini) In some embodiments, the first building blocktag includes a protecting group at the 3′-terminus or at the5′-terminus.

In any of the embodiments described herein, the 2′-substitutednucleotide is a 2′-O-methyl nucleotide (e.g., 2′-O-methyl guanine or2′-O-methyl uracil) or a 2′-fluoro nucleotide (e.g., 2′-fluoro guanine,or 2′-fluoro uracil).

In any of the above embodiments, step (ii) may include joining, binding,or operatively associating the headpiece directly to the first component(e.g., a scaffold or a first building block). In yet other embodiments,step (ii) includes binding the headpiece indirectly to the firstcomponent (e.g., a scaffold or a first building block) via abifunctional linker (e.g., the method includes binding the headpiecewith the first functional group of the linker and binding the firstcomponent with the second functional group of the linker).

In any of the above embodiments, the method may further include (iv)binding a second building block tag to the 5′-terminus or 3′-terminus ofthe complex; and (v) binding a second component (e.g., a first buildingblock or a second building block) of the chemical library to the firstcomponent, where steps (iv) and (v) can be performed in any order. Insome embodiments, the second building block tag encodes for the bindingreaction of step (v). In other embodiments, step (iv) may includebinding the second building block tag to the 5′-terminus of the complex;the complex includes a phosphate group at the 5′-terminus; and thesecond building block tag includes a hydroxyl group at both of the 3′-and 5′-termini. In other embodiments, step (iv) may further includepurifying the complex and reacting the complex with a polynucleotidekinase to form a phosphate group on the 5′-terminus prior to binding thesecond building block tag. In other embodiments, step (iv) may includebinding the second building block tag to the 3′-terminus of the complex;the complex includes a protecting group at the 3′-terminus; and thesecond building block tag includes a phosphate group at the 5′-terminusand a protecting group at the 3′-terminus. In yet other embodiments,step (iv) may further include reacting the complex with a hydrolyzingagent to release the protecting group from the complex prior to bindingthe second building block tag to the complex.

In further embodiments, the second building block tag includes a2′-substituted nucleotide (e.g., a 2′-O-methyl nucleotide or a 2′-fluoronucleotide) at one or more of the 5′-terminus, the 3′-terminus, or theinternal position of the second building block tag (e.g., a 2′-O-methylnucleotide and/or a 2′-fluoro nucleotide at both of the 5′- and3′-termini).

In some embodiments, step (iv) may include the use of an RNA ligase(e.g., T4 RNA ligase) and/or a DNA ligase (e.g., a ssDNA ligase) to bindthe second building block tag to the complex (e.g., may include the useof both RNA ligase and the DNA ligase).

In other embodiments, step (iii) may include the use of an RNA ligase(e.g., T4 RNA ligase) and/or a DNA ligase (e.g., ssDNA ligase) to bindthe headpiece to the first building block tag (e.g., may include the useof both RNA ligase and the DNA ligase).

In further embodiments, step (iii) and/or step (iv), if present, mayinclude the use of poly ethylene glycol and/or one or more solublemultivalent cations (e.g., magnesium chloride, manganese (II) chloride,or hexamine cobalt (III) chloride). In some embodiments, the polyethylene glycol is in an amount from about 25% (w/v) to about 35% (w/v)(e.g., from about 25% (w/v) to about 30% (w/v), from about 30% (w/v) toabout 35% (w/v), or about 30% (w/v)). In other embodiments, the polyethylene glycol has an average molecular weight from about 3,000 toabout 5,500 Daltons (e.g., about 4,600 Daltons). In other embodiments,the one or more soluble multivalent cations are in an amount of fromabout 0.05 mM to about 10.5 mM (e.g., from 0.05 mM to 0.5 mM, from 0.05mM to 0.75 mM, from 0.05 mM to 1.0 mM, from 0.05 mM to 1.5 mM, from 0.05mM to 2.0 mM, from 0.05 mM to 3.0 mM, from 0.05 mM to 4.0 mM, from 0.05mM to 5.0 mM, from 0.05 mM to 6.0 mM, from 0.05 mM to 7.0 mM, from 0.05mM to 8.0 mM, from 0.05 mM to 9.0 mM, from 0.05 mM to 10.0 mM, from 0.1mM to 0.5 mM, from 0.1 mM to 0.75 mM, from 0.1 mM to 1.0 mM, from 0.1 mMto 1.5 mM, from 0.1 mM to 2.0 mM, from 0.1 mM to 3.0 mM, from 0.1 mM to4.0 mM, from 0.1 mM to 5.0 mM, from 0.1 mM to 6.0 mM, from 0.1 mM to 7.0mM, from 0.1 mM to 8.0 mM, from 0.1 mM to 9.0 mM, from 0.1 mM to 10.0mM, from 0.1 mM to 10.5 mM, from 0.5 mM to 0.75 mM, from 0.5 mM to 1.0mM, from 0.5 mM to 1.5 mM, from 0.5 mM to 2.0 mM, from 0.5 mM to 3.0 mM,from 0.5 mM to 4.0 mM, from 0.5 mM to 5.0 mM, from 0.5 mM to 6.0 mM,from 0.5 mM to 7.0 mM, from 0.5 mM to 8.0 mM, from 0.5 mM to 9.0 mM,from 0.5 mM to 10.0 mM, from 0.5 mM to 10.5 mM, from 0.75 mM to 1.0 mM,from 0.75 mM to 1.5 mM, from 0.75 mM to 2.0 mM, from 0.75 mM to 3.0 mM,from 0.75 mM to 4.0 mM, from 0.75 mM to 5.0 mM, from 0.75 mM to 6.0 mM,from 0.75 mM to 7.0 mM, from 0.75 mM to 8.0 mM, from 0.75 mM to 9.0 mM,from 0.75 mM to 10.0 mM, from 0.75 mM to 10.5 mM, from 1.0 mM to 1.5 mM,from 1.0 mM to 2.0 mM, from 1.0 mM to 3.0 mM, from 1.0 mM to 4.0 mM,from 1.0 mM to 5.0 mM, from 1.0 mM to 6.0 mM, from 1.0 mM to 7.0 mM,from 1.0 mM to 8.0 mM, from 1.0 mM to 9.0 mM, from 1.0 mM to 10.0 mM,from 1.0 mM to 10.5 mM, from 1.5 mM to 2.0 mM, from 1.5 mM to 3.0 mM,from 1.5 mM to 4.0 mM, from 1.5 mM to 5.0 mM, from 1.5 mM to 6.0 mM,from 1.5 mM to 7.0 mM, from 1.5 mM to 8.0 mM, from 1.5 mM to 9.0 mM,from 1.5 mM to 10.0 mM, from 1.5 mM to 10.5 mM, from 2.0 mM to 3.0 mM,from 2.0 mM to 4.0 mM, from 2.0 mM to 5.0 mM, from 2.0 mM to 6.0 mM,from 2.0 mM to 7.0 mM, from 2.0 mM to 8.0 mM, from 2.0 mM to 9.0 mM,from 2.0 mM to 10.0 mM, and from 2.0 mM to 10.5 mM). In someembodiments, one or more multivalent cations are in an amount of about 1mM (e.g., from 0.5 mM to 1.5 mM). In a particular embodiment, themultivalent cation is in the form of hexamine cobalt (III) chloride.

In other embodiments, the method further includes separating the complexfrom any unreacted tag or unreacted headpiece before any one of bindingsteps (ii)-(v). In other embodiments, the method further includespurifying the complex before any one of binding steps (ii)-(v). In otherembodiments, the method further includes binding one or more additionalcomponents (e.g., a scaffold or a first building block) and one or moreadditional building block tags, in any order and after any one ofbinding step (ii)-(v).

The invention also features a method of tagging a first libraryincluding an oligonucleotide-encoded chemical entity, the methodincluding: (i) providing a headpiece having a first functional group anda second functional group, where the headpiece includes a 2′-substitutednucleotide at the 5′-terminus, optionally one or more nucleotides at theinternal position of the headpiece, and a protecting group at the2′-position and/or the 3′-position at the 3′-terminus; (ii) binding thefirst functional group of the headpiece to a first component of thechemical entity, where the headpiece is directly connected to the firstcomponent or the headpiece is indirectly connected to the firstcomponent by a bifunctional linker; and (iii) binding the secondfunctional group of the headpiece to a first building block tag, wherethe first building block tag includes a 2′-substituted nucleotide and ahydroxyl group at the 5′-terminus, optionally one or more nucleotides atthe internal position of the tag, and a 2′-substituted nucleotide and ahydroxyl group at the 3′-terminus; where steps (ii) and (iii) can beperformed in any order and where the first building block tag encodesfor the binding reaction of step (ii), thereby providing a taggedlibrary.

In some embodiments, the 2′-substituted nucleotide is a 2′-O-methylnucleotide (e.g., 2′-O-methyl guanine) or a 2′-fluoro nucleotide (e.g.,2′-fluoro guanine). In other embodiments, one or more nucleotides at theinternal position of the headpiece are 2′-deoxynucleotides. In yet otherembodiments, the bifunctional linker is a poly ethylene glycol linker(e.g., —(CH₂CH₂O)_(n)CH₂CH₂—, where n is an integer from 1 to 50).

In other embodiments, one or more nucleotides (e.g., one or more2′-deoxynucleotides) are present at the internal position of theheadpiece or the tag.

In some embodiments, step (iii) may include the use of one or moresoluble multivalent cations (e.g., magnesium chloride, manganese (II)chloride, or hexamine cobalt (III) chloride), poly ethylene glycol(e.g., having an average molecular weight of about 4,600 Daltons), andRNA ligase (e.g., T4 RNA ligase).

In another aspect, the invention features methods to identify and/ordiscover a chemical entity, the method including tagging a first libraryincluding an oligonucleotide-encoded chemical entity (e.g., includingsteps (i) to (iii) and optionally including steps (iv) to (v)) andselecting for a particular characteristic or function (e.g., selectingfor binding to a protein target including exposing theoligonucleotide-encoded chemical entity or chemical entity to theprotein target and selecting the one or more oligonucleotide-encodedchemical entities or chemical entities that bind to the protein target(e.g., by using size exclusion chromatography)). The invention alsofeatures a complex including a headpiece and a building block tag, wherethe tag includes from 5 to 20 nucleotides, a 2′-substituted nucleotideat the 5′-terminus, and a 2′-substituted nucleotide at the 3′-terminus.In some embodiments, the 2′-substituted nucleotide at the 5′-terminusand/or 3′-terminus is a 2′-O-methyl nucleotide (e.g., 2′-O-methylguanine or 2′-O-methyl uracil) or a 2′-fluoro nucleotide (e.g.,2′-fluoro guanine or 2′-fluoro uracil). In particular embodiments, theheadpiece includes a hairpin structure. In some embodiments, theheadpiece includes a 2′-substituted nucleotide at one or more of the5′-terminus, the 3′-terminus, or the internal position of the headpiece.In other embodiments, the headpiece further includes a preadenylated5′-terminus. In yet other embodiments, the headpiece includes from 5 to20 nucleotides.

In any of the above embodiments, the headpiece, the first building blocktag, the second building block tag, or the one or more additionalbuilding block tags, if present, includes a preadenylated 5′-terminus.

In any of the above embodiments, the method further includes binding oneor more (e.g., one, two, three, four, five, six, seven, eight, nine, orten) additional building block tags to the complex and binding one ormore (e.g., one, two, three, four, five, six, seven, eight, nine, orten) additional components (e.g., scaffolds or building blocks) to thecomplex, where the one or more additional building block tag encodes forthe one or more additional components or encodes for the bindingreaction of one or more additional components, thereby providing atagged library.

In any of the above embodiments, the 2′-substituted nucleotide is a2′-O-methyl nucleotide, such as 2′-O-methyl guanine, 2′-O-methyl uracil,2′-O-methyl adenosine, 2′-O-methyl thymidine, 2′-O-methyl inosine,2′-O-methyl cytidine, or 2′-O-methyl diamino purine. Alternatively, inany of the above embodiments, the 2′-substituted nucleotide is a2′-fluoro nucleotide, such as 2′-fluoro guanine, 2′-fluoro uracil,2′-fluoro adenosine, 2′-fluoro thymidine, 2′-fluoro inosine, 2′-fluorocytidine, or 2′-fluoro diamino purine.

In any of the above embodiments, the RNA ligase is T4 RNA ligase and/orthe DNA ligase is a ssDNA ligase.

In any of the above embodiments, the method includes a plurality ofheadpieces. In some embodiments of this method, each headpiece of theplurality of headpieces includes an identical sequence region and adifferent encoding region. In particular embodiments, the identicalsequence region is a primer binding region. In other embodiments, thedifferent encoding region is an initial building block tag that encodesfor the headpiece or for an addition of an initial component.

In any of the above embodiments, binding in at least one of steps(ii)-(iv), if present, includes enzyme ligation and/or chemicalligation. In some embodiments, enzymatic ligation includes use of an RNAligase (e.g., T4 RNA ligase) or a DNA ligase (e.g., ssDNA ligase). Inother embodiments, enzymatic ligation includes use of an RNA ligase(e.g., T4 RNA ligase) and a DNA ligase (e.g., ssDNA ligase). In someembodiments, chemical ligation includes use of one or more chemicallyco-reactive pairs (e.g., a pair including an optionally substitutedalkynyl group with an optionally substituted azido group; a pairincluding an optionally substituted diene having a 4π electron system(e.g., an optionally substituted 1,3-unsaturated compound, such asoptionally substituted 1,3-butadiene,1-methoxy-3-trimethylsilyloxy-1,3-butadiene, cyclopentadiene,cyclohexadiene, or furan) with an optionally substituted dienophile oran optionally substituted heterodienophile having a 2π electron system(e.g., an optionally substituted alkenyl group or an optionallysubstituted alkynyl group); a pair including a nucleophile (e.g., anoptionally substituted amine or an optionally substituted thiol) with astrained heterocyclyl electrophile (e.g., optionally substitutedepoxide, aziridine, aziridinium ion, or episulfonium ion); a pairincluding a phosphorothioate group with an iodo group (e.g., aphosphorothioate group at the 3′-terminus and an iodo group at the5′-terminus); or a pair including an aldehyde group with an amino group(e.g., a primary amino or a secondary amino group, including a hydrazidogroup)). In particular embodiments, the chemically co-reactive pairproduces a resultant spacer having a length from about 4 to about 24atoms (e.g., from about 4 to about 10 atoms). In other embodiments,chemical ligation includes use of a phosphorothioate group (e.g., at the3′-terminus) and an iodo group (e.g., at the 5′-terminus). In furtherembodiments, chemical ligation includes a splint oligonucleotide in thebinding reaction. In some embodiments, the chemical ligation includesuse of a phosphorothioate group (e.g., at the 3′-terminus of theheadpiece, the first building block tag, the second building block tag,the one or more additional building block tags, the library-identifyingtag, the use tag, and/or the origin tag, if present), an iodo group(e.g., at the 5′-terminus of the headpiece, the first building blocktag, the second building block tag, the one or more additional buildingblock tags, the library-identifying tag, the use tag, and/or the origintag, if present), and a splint oligonucleotide in the binding reaction,where the use avoids use of one or more protecting groups. In otherembodiments, chemical ligation of multiple tags comprises alternatinguse of orthogonal chemically co-reactive pairs (e.g., any two or morechemically co-reactive pairs described herein) for ligating successivetags.

In any of the above embodiments, the headpiece may include asingle-stranded (e.g., hairpin) structure.

In any of the above embodiments, the headpiece, the first building blocktag, the second building block tag, the one or more additional buildingblock tags, the library-identifying tag, the use tag, and/or the origintag, if present, includes a sequence that is substantially identical(e.g., at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%,99%, or 100% identical) to any sequence described herein (e.g., thesequence in any one of SEQ ID NOs: 6-21, 26, 27, or 29-31), or asequence that is complementary to a sequence that is substantiallyidentical (e.g., at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%,97%, 98%, 99%, or 100% identical) to any sequence described herein(e.g., the sequence in any one of SEQ ID NOs: 6-21, 26, 27, or 29-31).In particular embodiments, the first building block tag, the secondbuilding block tag, the one or more additional building block tags, thelibrary-identifying tag, the use tag, and/or the origin tag, if present,further includes a sequence that is substantially identical (e.g., atleast 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or100% identical) to the sequence of SEQ ID NO: 1 or SEQ ID NO: 2.

In any of the above embodiments, the methods or complexes include onlysingle-stranded molecules, where the headpiece, the first building blocktag, the second building block tag, and/or the one or more additionalbuilding block tags are single-stranded. In some embodiments, one ormore of the single-stranded molecules have a hairpin structure. Inparticular embodiments, the headpiece includes a hairpin structure andthe one or more building block tags do not include a hairpin structure.

In any of the above embodiments, the method further comprises one ormore optional steps to diversify the library or to interrogate themembers of the library, as described herein. In some embodiments, themethod further comprises identifying a small drug-like library memberthat binds or inactivates a protein of therapeutic interest. In otherembodiments, the method further comprises contacting a member of thelibrary with a biological target under conditions suitable for at leastone member of the library to bind to the target, removing one or morelibrary members that do not bind to the target, and analyzing the one ormore oligonucleotide tags associated with them.

As described herein, the use of single-stranded molecules (e.g.,including hairpin molecules) could have numerous benefits. Accordingly,in any of the embodiments described herein, the methods and complexesinclude a headpiece, one or more building block tags, a complex, achemical entity, a molecule, or any member of a tagged library havingdecreased mass, increased solubility (e.g., in an organic solvent),decreased cost, increased reactivity, increased target accessibility,decreased hydrodynamic radius, and/or increased accuracy of analyticalassessments, as compared to a method including one or moredouble-stranded molecules (e.g., a double-stranded headpiece or adouble-stranded building block tag). In some embodiments, each of thebuilding block tags (e.g., the first building block tag, the secondbuilding block tag, and/or one or more additional building block tags,if present) has about the same mass (e.g., each building block tag has amass that is about +/−10% from the average mass between two or morebuilding block tags). In particular embodiments, the building block taghas a decreased mass (e.g., less than about 15,000 Daltons, about 14,000Daltons, about 13,000 Daltons, about 12,000 Daltons, about 11,000Daltons, about 10,000 Daltons, about 9,000 Daltons, about 8,000 Daltons,about 7,500 Daltons, about 7,000 Daltons, about 6,000 Daltons, about6,500 Daltons, about 5,000 Daltons, about 5,500 Daltons, about 4,000Daltons, about 4,500 Daltons, or about 3,000 Daltons) compared to adouble-stranded tag (e.g., a double-stranded tag having a mass of about15,000 Daltons, about 14,000 Daltons, about 13,000 Daltons, or about12,000 Daltons). In other embodiments, the building block tag has areduced length compared to a double-stranded tag (e.g., adouble-stranded tag having a length of less than about 20 nucleotides,less than about 19 nucleotides, less than about 18 nucleotides, lessthan about 17 nucleotides, less than about 16 nucleotides, less thanabout 15 nucleotides, less than about 14 nucleotides, less than about 13nucleotides, less than about 12 nucleotides, less than about 11nucleotides, less than about 10 nucleotides, less than about 9nucleotides, less than about 8 nucleotides, or less than about 7nucleotides). In some embodiments, one or more building block tags ormembers of the library lack a primer binding region and/or a constantregion (e.g., during a selection step, such as selection using sizeexclusion chromatography). In some embodiments, one or more buildingblock tags or members of the library have a reduced constant region(e.g., a length less than about 30 nucleotides, less than about 25nucleotides, less than about 20 nucleotides, less than about 19nucleotides, less than about 18 nucleotides, less than about 17nucleotides, less than about 16 nucleotides, less than about 15nucleotides, less than about 14 nucleotides, less than about 13nucleotides, less than about 12 nucleotides, less than about 11nucleotides, less than about 10 nucleotides, less than about 9nucleotides, less than about 8 nucleotides, or less than about 7nucleotides). In other embodiments, the methods include a headpiece thatencodes for a molecule, a portion of a chemical entity, a bindingreaction (e.g., chemical or enzymatic ligation) of a step, or theidentity of a library, where the encoding headpiece eliminates the needof an additional building block tag to encode such information.

In any of the above embodiments, an oligonucleotide (e.g., theheadpiece, the first building block tag, the second building block tag,and/or one or more additional building block tags, if present) encodesfor the identity of the library. In some embodiments, theoligonucleotide (e.g., the headpiece, the first building block tag, thesecond building block tag, and/or one or more additional building blocktags, if present) includes a first library-identifying sequence, wherethe sequence encodes for the identity of the first library. Inparticular embodiments, the oligonucleotide is a firstlibrary-identifying tag. In some embodiments, the method includesproviding a first library-identifying tag, where the tag includes asequence that encodes for a first library, and/or binding the firstlibrary-identifying tag to the complex. In some embodiments, the methodincludes providing a second library and combining the first library witha second library. In further embodiments, the method includes providinga second library-identifying tag, where the tag includes a sequence thatencodes for a second library.

In any of the above embodiments, an oligonucleotide (e.g., a headpieceand/or one or more building blocks) encodes for the use of the member ofthe library (e.g., use in a selection step or a binding step, asdescribed herein). In some embodiments, the oligonucleotide (e.g., theheadpiece, the first building block tag, the second building block tag,and/or one or more additional building block tags, if present) includesa use sequence, where the sequence encodes for use of a subset ofmembers in the library in one or more steps (e.g., a selection stepand/or a binding step). In particular embodiments, the oligonucleotideis a use tag including a use sequence. In some embodiments, anoligonucleotide (e.g., a headpiece and/or one or more building blocks)encodes for the origin of the member of the library (e.g., in aparticular part of the library). In some embodiments, theoligonucleotide (e.g., the headpiece, the first building block tag, thesecond building block tag, and/or one or more additional building blocktags, if present) includes an origin sequence (e.g., a random degeneratesequence having a length of about 10, 9, 8, 7, or 6 nucleotides), wherethe sequence encodes for the origin of the member in the library. Inparticular embodiments, the oligonucleotide is an origin tag includingan origin sequence. In some embodiments, the method further includesjoining, binding, or operatively associating a use tag and/or an origintag to the complex.

In any of the above embodiments, the methods, compositions, andcomplexes optionally include a tailpiece, where the tailpiece includesone or more of a library-identifying sequence, a use sequence, or anorigin sequence, as described herein. In particular embodiments, themethods further include joining, binding, or operatively associating thetailpiece (e.g., including one or more of a library-identifyingsequence, a use sequence, or an origin sequence) to the complex.

In any of the above embodiments, the methods, compositions, andcomplexes, or portions thereof (e.g., the headpiece, the first buildingblock tag, the second building block tag, and/or the one or moreadditional building block tags, if present), includes a modifiedphosphate group (e.g., a phosphorothioate or a 5′-N-phosphoramiditelinkage) between the terminal nucleotide at the 3′-terminus and thenucleotide adjacent to the terminal nucleotide. In particularembodiments, the modified phosphate group minimizes shuffling duringenzymatic ligation between two oligonucleotides (e.g., minimizesinclusion of an additional nucleotide or excision of a nucleotide in thefinal product or complex, as compared to the sequences of twooligonucleotides to be ligated, such as between a headpiece to abuilding block tag or between a first building block tag and a secondbuilding block tag), as compared to ligation between twooligonucleotides (e.g., a headpiece and a building block tag or a firstbuilding block tag and a second building block tag) lacking the modifiedphosphate group. In some embodiments, the complex may include aphosphorothioate or a triazole group.

In any of the above embodiments, the methods, compositions, andcomplexes, or portions thereof (e.g., the headpiece, the first buildingblock tag, the second building block tag, and/or the one or moreadditional building block tags, if present), includes a modificationthat supports solubility in semi-, reduced-, or non-aqueous (e.g.,organic) conditions. In some embodiments, the bifunctional linker,headpiece, or one or more building block tags is modified to increasesolubility of a member of said DNA-encoded chemical library in organicconditions. In some embodiments, the modification is one or more of analkyl chain, a polyethylene glycol unit, a branched species withpositive charges, or a hydrophobic ring structure. In some embodiments,the modification includes one or more modified nucleotides having ahydrophobic moiety (e.g., modified at the C5 positions of T or C baseswith aliphatic chains, such as in5′-dimethoxytrityl-N4-diisobutylaminomethylidenc-5-(1-propynyl)-2′-deoxycytidine,3‘-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;5’-dimethoxytrityl-5-(1-propynyl)-2′-deoxyuridine,3′4[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;5′-dimethoxytrityl-5-fluoro-2′-deoxyuridine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;and 5′-dimethoxytrityl-5-(pyren-1-yl-ethynyl)-2′-deoxyuridine, or3′[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite) or an insertionhaving a hydrophobic moiety (e.g., an azobenzene). In some embodiments,the member of the library has an octanol:water coefficient from about1.0 to about 2.5 (e.g., about 1.0 to about 1.5, about 1.0 to about 2.0,about 1.3 to about 1.5, about 1.3 to about 2.0, about 1.3 to about 2.5,about 1.5 to about 2.0, about 1.5 to about 2.5, or about 2.0 to about2.5).

In any of the above embodiments, the headpiece, the tailpiece, the firstbuilding block tag, the second building block tag, the one or moreadditional building block tags, the library-identifying tag, the usetag, and/or the origin tag, if present, may include from 5 to 20nucleotides (e.g., from 5 to 7 nucleotides, from 5 to 8 nucleotides,from 5 to 9 nucleotides, from 5 to 10 nucleotides, from 5 to 11nucleotides, from 5 to 12 nucleotides, from 5 to 13 nucleotides, from 5to 14 nucleotides, from 5 to 15 nucleotides, from 5 to 16 nucleotides,from 5 to 17 nucleotides, from 5 to 18 nucleotides, from 5 to 19nucleotides, from 6 to 7 nucleotides, from 6 to 8 nucleotides, from 6 to9 nucleotides, from 6 to 10 nucleotides, from 6 to 11 nucleotides, from6 to 12 nucleotides, from 6 to 13 nucleotides, from 6 to 14 nucleotides,from 6 to 15 nucleotides, from 6 to 16 nucleotides, from 6 to 17nucleotides, from 6 to 18 nucleotides, from 6 to 19 nucleotides, from 6to 20 nucleotides, from 7 to 8 nucleotides, from 7 to 9 nucleotides,from 7 to 10 nucleotides, from 7 to 11 nucleotides, from 7 to 12nucleotides, from 7 to 13 nucleotides, from 7 to 14 nucleotides, from 7to 15 nucleotides, from 7 to 16 nucleotides, from 7 to 17 nucleotides,from 7 to 18 nucleotides, from 7 to 19 nucleotides, from 7 to 20nucleotides, from 8 to 9 nucleotides, from 8 to 10 nucleotides, from 8to 11 nucleotides, from 8 to 12 nucleotides, from 8 to 13 nucleotides,from 8 to 14 nucleotides, from 8 to 15 nucleotides, from 8 to 16nucleotides, from 8 to 17 nucleotides, from 8 to 18 nucleotides, from 8to 19 nucleotides, from 8 to 20 nucleotides, from 9 to 10 nucleotides,from 9 to 11 nucleotides, from 9 to 12 nucleotides, from 9 to 13nucleotides, from 9 to 14 nucleotides, from 9 to 15 nucleotides, from 9to 16 nucleotides, from 9 to 17 nucleotides, from 9 to 18 nucleotides,from 9 to 19 nucleotides, from 9 to 20 nucleotides, from 10 to 11nucleotides, from 10 to 12 nucleotides, from 10 to 13 nucleotides, from10 to 14 nucleotides, from 10 to 15 nucleotides, from 10 to 16nucleotides, from 10 to 17 nucleotides, from 10 to 18 nucleotides, from10 to 19 nucleotides, from 10 to 20 nucleotides, from 11 to 12nucleotides, from 11 to 13 nucleotides, from 11 to 14 nucleotides, from11 to 15 nucleotides, from 11 to 16 nucleotides, from 11 to 17nucleotides, from 11 to 18 nucleotides, from 11 to 19 nucleotides, from11 to 20 nucleotides, from 12 to 13 nucleotides, from 12 to 14nucleotides, from 12 to 15 nucleotides, from 12 to 16 nucleotides, from12 to 17 nucleotides, from 12 to 18 nucleotides, from 12 to 19nucleotides, from 12 to 20 nucleotides, from 13 to 14 nucleotides, from13 to 15 nucleotides, from 13 to 16 nucleotides, from 13 to 17nucleotides, from 13 to 18 nucleotides, from 13 to 19 nucleotides, from13 to 20 nucleotides, from 14 to 15 nucleotides, from 14 to 16nucleotides, from 14 to 17 nucleotides, from 14 to 18 nucleotides, from14 to 19 nucleotides, from 14 to 20 nucleotides, from 15 to 16nucleotides, from 15 to 17 nucleotides, from 15 to 18 nucleotides, from15 to 19 nucleotides, from 15 to 20 nucleotides, from 16 to 17nucleotides, from 16 to 18 nucleotides, from 16 to 19 nucleotides, from16 to 20 nucleotides, from 17 to 18 nucleotides, from 17 to 19nucleotides, from 17 to 20 nucleotides, from 18 to 19 nucleotides, from18 to 20 nucleotides, and from 19 to 20 nucleotides). In particularembodiments, the headpiece, the first building block tag, the secondbuilding block tag, the one or more additional building block tags, thelibrary-identifying tag, the use tag, and/or the origin tag, if present,have a length of less than 20 nucleotides (e.g., less than 19nucleotides, less than 18 nucleotides, less than 17 nucleotides, lessthan 16 nucleotides, less than 15 nucleotides, less than 14 nucleotides,less than 13 nucleotides, less than 12 nucleotides, less than 11nucleotides, less than 10 nucleotides, less than 9 nucleotides, lessthan 8 nucleotides, or less than 7 nucleotides).

In particular embodiments, the first building block tag and the secondbuilding block tag include the same number of nucleotides. In otherembodiments, either the first building block tag or the second buildingblock tag includes more than 8 nucleotides (e.g., more than 9nucleotides, more than 10 nucleotides, more than 11 nucleotides, morethan 12 nucleotides, more than 13 nucleotides, more than 14 nucleotides,and more than 15 nucleotides). In some embodiments, the first buildingblock tag is a donor tag (e.g., as defined herein) having from 8 to 20nucleotides (e.g., from 8 to 9 nucleotides, from 8 to 10 nucleotides,from 8 to 11 nucleotides, from 8 to 12 nucleotides, from 8 to 13nucleotides, from 8 to 14 nucleotides, from 8 to 15 nucleotides, from 8to 16 nucleotides, from 8 to 17 nucleotides, from 8 to 18 nucleotides,from 8 to 19 nucleotides, from 8 to 20 nucleotides, from 9 to 10nucleotides, from 9 to 11 nucleotides, from 9 to 12 nucleotides, from 9to 13 nucleotides, from 9 to 14 nucleotides, from 9 to 15 nucleotides,from 9 to 16 nucleotides, from 9 to 17 nucleotides, from 9 to 18nucleotides, from 9 to 19 nucleotides, from 9 to 20 nucleotides, from 10to 11 nucleotides, from 10 to 12 nucleotides, from 10 to 13 nucleotides,from 10 to 14 nucleotides, from 10 to 15 nucleotides, from 10 to 16nucleotides, from 10 to 17 nucleotides, from 10 to 18 nucleotides, from10 to 19 nucleotides, from 10 to 20 nucleotides, from 11 to 12nucleotides, from 11 to 13 nucleotides, from 11 to 14 nucleotides, from11 to 15 nucleotides, from 11 to 16 nucleotides, from 11 to 17nucleotides, from 11 to 18 nucleotides, from 11 to 19 nucleotides, from11 to 20 nucleotides, from 12 to 13 nucleotides, from 12 to 14nucleotides, from 12 to 15 nucleotides, from 12 to 16 nucleotides, from12 to 17 nucleotides, from 12 to 18 nucleotides, from 12 to 19nucleotides, from 12 to 20 nucleotides, from 13 to 14 nucleotides, from13 to 15 nucleotides, from 13 to 16 nucleotides, from 13 to 17nucleotides, from 13 to 18 nucleotides, from 13 to 19 nucleotides, from13 to 20 nucleotides, from 14 to 15 nucleotides, from 14 to 16nucleotides, from 14 to 17 nucleotides, from 14 to 18 nucleotides, from14 to 19 nucleotides, from 14 to 20 nucleotides, from 15 to 16nucleotides, from 15 to 17 nucleotides, from 15 to 18 nucleotides, from15 to 19 nucleotides, from 15 to 20 nucleotides, from 16 to 17nucleotides, from 16 to 18 nucleotides, from 16 to 19 nucleotides, from16 to 20 nucleotides, from 17 to 18 nucleotides, from 17 to 19nucleotides, from 17 to 20 nucleotides, from 18 to 19 nucleotides, from18 to 20 nucleotides, and from 19 to 20 nucleotides).

Definitions

By “2′-substituted nucleotide” is meant a nucleotide base having asubstitution at the 2′-position of ribose in the base.

By “about” is meant +/−10% of the recited value.

By “bifunctional” is meant having two reactive groups that allow forbinding of two chemical moieties. For example, a bifunctional linker isa linker, as described herein, having two reactive groups that allow forbinding of a headpiece and a chemical entity

By “binding” is meant attaching by a covalent bond or a non-covalentbond. Non-covalent bonds include those formed by van der Waals forces,hydrogen bonds, ionic bonds, entrapment or physical encapsulation,absorption, adsorption, and/or other intermolecular forces. Binding canbe effectuated by any useful means, such as by enzymatic binding (e.g.,enzymatic ligation) or by chemical binding (e.g., chemical ligation).

By “building block” is meant a structural unit of a chemical entity,where the unit is directly linked to other chemical structural units orindirectly linked through the scaffold. When the chemical entity ispolymeric or oligomeric, the building blocks are the monomeric units ofthe polymer or oligomer. Building blocks can have one or more diversitynodes that allow for the addition of one or more other building blocksor scaffolds. In most cases, each diversity node is a functional groupcapable of reacting with one or more building blocks or scaffolds toform a chemical entity. Generally, the building blocks have at least twodiversity nodes (or reactive functional groups), but some buildingblocks may have one diversity node (or reactive functional group).Alternatively, the encoded chemical or binding steps may include severalchemical components (e.g., multi-component condensation reactions ormulti-step processes). Reactive groups on two different building blocksshould be complementary, i.e., capable of reacting together to form acovalent or a non-covalent bond.

By “building block tag” is meant an oligonucleotide portion of thelibrary that encodes the addition (e.g., by a binding reaction) of acomponent (i.e., a scaffold or a building block), the headpiece in thelibrary, the identity of the library, the use of the library, and/or theorigin of a library member. By “acceptor tag” is meant a building blocktag having a reactive entity (e.g., a hydroxyl group at the 3′-terminusin the case of enzymatic ligation). By “donor tag” is meant a buildingblock tag having an entity capable of reacting with the reactive entityon the acceptor tag (e.g., a phosphoryl group at the 5′-terminus in thecase of enzymatic ligation).

By “chemical entity” is meant a compound comprising one or more buildingblocks and optionally a scaffold. The chemical entity can be any smallmolecule or peptide drug or drug candidate designed or built to have oneor more desired characteristics, e.g., capacity to bind a biologicaltarget, solubility, availability of hydrogen bond donors and acceptors,rotational degrees of freedom of the bonds, positive charge, negativecharge, and the like. In certain embodiments, the chemical entity can bereacted further as a bifunctional or trifunctional (or greater) entity.

By “chemically co-reactive pair” is meant a pair of reactive groups thatparticipates in a modular reaction with high yield and a highthermodynamic gain, thus producing a spacer. Exemplary reactions andchemically co-reactive pairs include a Huisgen 1,3-dipolar cycloadditionreaction with a pair of an optionally substituted alkynyl group and anoptionally substituted azido group; a Diels-Alder reaction with a pairof an optionally substituted diene having a 4π electron system and anoptionally substituted dienophile or an optionally substitutedheterodienophile having a 2π electron system; a ring opening reactionwith a nucleophile and a strained heterocyclyl electrophile; a splintligation reaction with a phosphorothioate group and an iodo group; and areductive amination reaction with an aldehyde group and an amino group,as described herein.

By “complex” or “ligated complex” is meant a headpiece that isoperatively associated with a chemical entity and/or one or moreoligonucleotide tags by a covalent bond or a non-covalent bond. Thecomplex can optionally include a bifunctional linker between thechemical entity and the headpiece.

By “component” of a chemical entity is meant either a scaffold or abuilding block.

By “diversity node” is meant a functional group at a position in thescaffold or the building block that allows for adding another buildingblock.

By “headpiece” is meant a starting oligonucleotide for library synthesisthat is operatively linked to a component of a chemical entity and to abuilding block tag. Optionally, a bifunctional linker connects theheadpiece to the component.

By “library” is meant a collection of molecules or chemical entities.Optionally, the molecules or chemical entities are bound to one or moreoligonucleotides that encodes for the molecules or portions of thechemical entity.

By “linker” is meant a chemical connecting entity that links theheadpiece to a chemical entity.

By “multivalent cation” is meant a cation capable of forming more thanone bond with more than one ligand or anion. The multivalent cation canform either an ionic complex or a coordination complex. Exemplarymultivalent cations include those from the alkali earth metals (e.g.,magnesium) and transition metals (e.g., manganese (II) or cobalt (III)),and those that are optionally bound to one or more anions and/or one ormore univalent or polydentate ligands, such as chloride, amine, and/orethylenediamine.

By “oligonucleotide” is meant a polymer of nucleotides having a5′-terminus, a 3′-terminus, and one or more nucleotides at the internalposition between the 5′- and 3′-termini. The oligonucleotide may includeDNA, RNA, or any derivative thereof known in the art that can besynthesized and used for base-pair recognition. The oligonucleotide doesnot have to have contiguous bases but can be interspersed with linkermoieties. The oligonucleotide polymer may include natural bases (e.g.,adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine,deoxythymidine, deoxyguanosine, deoxycytidine, inosine, or diaminopurine), base analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine,pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine,C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine,C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine,8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine), modifiednucleotides (e.g., 2′-substituted nucleotides, such as 2′-O-methylatedbases and 2′-fluoro bases), intercalated bases, modified sugars (e.g.,2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose), and/ormodified phosphate groups (e.g., phosphorothioates and5′-N-phosphoramidite linkages). Other modified bases are describedherein. By “acceptor oligonucleotide” is meant an oligonucleotide havinga reactive entity (e.g., a hydroxyl group at the 3′-terminus in the caseof enzymatic ligation or an optionally substituted azido group in thecase of chemical ligation). By “donor oligonucleotide” is meant anoligonucleotide having an entity capable of reacting with the reactiveentity on the acceptor oligonucleotide (e.g., a phosphoryl group at the5′-terminus in the case of enzymatic ligation or an optionallysubstituted alkynyl group in the case of chemical ligation).

By “operatively linked” or “operatively associated” is meant that two ormore chemical structures are directly or indirectly linked together insuch a way as to remain linked through the various manipulations theyare expected to undergo. Typically, the chemical entity and theheadpiece are operatively linked in an indirect manner (e.g., covalentlyvia an appropriate linker). For example, the linker may be abifunctional moiety with a site of attachment for chemical entity and asite of attachment for the headpiece. In addition, the chemical entityand the oligonucleotide tag can be operatively linked directly orindirectly (e.g., covalently via an appropriate linker).

By “protecting group” is a meant a group intended to protect the3′-terminus or 5′-terminus of an oligonucleotide against undesirablereactions during one or more binding steps of tagging a DNA-encodedlibrary. Commonly used protecting groups are disclosed in Greene,“Protective Groups in Organic Synthesis,” 4^(th) Edition (John Wiley &Sons, New York, 2007), which is incorporated herein by reference.Exemplary protecting groups include irreversible protecting groups, suchas dideoxynucleotides and dideoxynucleosides (ddNTP or ddN), and, morepreferably, reversible protecting groups for hydroxyl groups, such asester groups (e.g., O-(α-methoxyethyl)ester, O-isovaleryl ester, andO-levulinyl ester), trityl groups (e.g., dimethoxytrityl andmonomethoxytrityl), xanthenyl groups (e.g., 9-phenylxanthen-9-yl and9-(p-methoxyphenyl)xanthen-9-yl), acyl groups (e.g., phenoxyacetyl andacetyl), and silyl groups (e.g., t-butyldimethylsilyl).

By “purifying” is meant removing any unreacted product or any agentpresent in a reaction mixture that may reduce the activity of a chemicalor biological agent to be used in a successive step. Purifying caninclude one or more of chromatographic separation, electrophoreticseparation, and precipitation of the unreacted product or reagent to beremoved.

By “scaffold” is meant a chemical moiety that displays one or morediversity nodes in a particular special geometry. Diversity nodes aretypically attached to the scaffold during library synthesis, but in somecases one diversity node can be attached to the scaffold prior tolibrary synthesis (e.g., addition of one or more building blocks and/orone or more tags). In some embodiments, the scaffold is derivatized suchthat it can be orthogonally deprotected during library synthesis andsubsequently reacted with different diversity nodes.

By “small molecule” drug or “small molecule” drug candidate is meant amolecule that has a molecular weight below about 1,000 Daltons. Smallmolecules may be organic or inorganic, isolated (e.g., from compoundlibraries or natural sources), or obtained by derivatization of knowncompounds.

By “substantial identity” or “substantially identical” is meant apolypeptide or polynucleotide sequence that has the same polypeptide orpolynucleotide sequence, respectively, as a reference sequence, or has aspecified percentage of amino acid residues or nucleotides,respectively, that are the same at the corresponding location within areference sequence when the two sequences are optimally aligned. Forexample, an amino acid sequence that is “substantially identical” to areference sequence has at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%,96%, 97%, 98%, 99%, or 100% identity to the reference amino acidsequence. For polypeptides, the length of comparison sequences willgenerally be at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, or 20 contiguous amino acids, more preferably at least 25, 50, 75,90, 100, 150, 200, 250, 300, or 350 contiguous amino acids, and mostpreferably the full-length amino acid sequence. For nucleic acids, thelength of comparison sequences will generally be at least 5 contiguousnucleotides, preferably at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20, 21, 22, 23, 24, or 25 contiguous nucleotides, and most preferablythe full length nucleotide sequence. Sequence identity may be measuredusing sequence analysis software on the default setting (e.g., SequenceAnalysis Software Package of the Genetics Computer Group, University ofWisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis.53705). Such software may match similar sequences by assigning degreesof homology to various substitutions, deletions, and othermodifications.

By “tailpiece” is meant an oligonucleotide portion of the library thatis attached to the complex after the addition of all of the buildingblock tags and encodes for the identity of the library, the use of thelibrary, and/or the origin of a library member.

Other features and advantages of the invention will be apparent from thefollowing Detailed Description and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an exemplary method for the general synthesis of chemicallibraries using single-stranded DNA tags that are joined sequentially bymeans of enzymatic and/or chemical ligation. “BB” refers to buildingblock.

FIGS. 2A-2B show exemplary methods for single-stranded DNA tagging oflibraries using enzymatic ligation. FIG. 2A shows an exemplary methodfor tagging libraries using single-stranded enzymatic ligation with aprotected (re-installed) 5′-monophosphate (5′-P) oligonucleotide, wheregray boxes refer to 2′-OMe nucleotides, “X” refers to a protecting groupor a component of a chemical entity, and “PNK” refers to polynucleotidekinase. FIG. 2B shows an exemplary method for tagging libraries usingsingle-stranded ligation with a protected 3′-OH oligonucleotide, whereblack boxes attached to —O-refer to a protecting group of the 3′-OHterminus and “LC” refers to liquid chromatographic separation of theprotecting group.

FIG. 3 shows an exemplary method for tagging libraries usingsingle-stranded ligation with a 5′-preadenylated (labeled “5′-App”)oligonucleotide (headpiece) with a 3′-terminus that is blocked, e.g., bya chemical entity (labeled “X-3′”). This method can be used to ligate a5′-phosphorylated oligonucleotide tag (labeled “Tag A”) to the headpieceand additional tags having a 3′-OH terminus (labeled “Tag B” and “TagC”) to the complex in the presence of ATP.

FIGS. 4A-4E show exemplary complexes, each having a headpiece, a linker,and a small molecule including a scaffold (“S”) and diversity nodes A,B, and C. The dark gray boxes refer to 2% OMe nucleotides, and thedotted lines refer to the presence of one or more complementary bases.FIGS. 4A-4B are schematics for complexes having a single-stranded linearoligonucleotide headpiece, where the linker and small molecule areconnected to the 3′-terminus (FIG. 4A) or the 5′-terminus (FIG. 4B) ofthe headpiece. FIGS. 4C-4D are schematics for complexes having asingle-stranded hairpin oligonucleotide headpiece, where the linker andsmall molecule are connected to the internal position (FIG. 4C) or the3′-terminus (FIG. 4D) of the headpiece. FIG. 4E shows an exemplarymethod for tagging libraries having a hairpin oligonucleotide headpiece,where the star refers to a chemical moiety and “Y” at the 3′-terminusrefers to a protecting group. Oligonucleotide tags are labeled 1-4, andthe adapter sequence is the black line at the 5′-terminus.

FIGS. 5A-5C show oligonucleotide ligation by T4 RNA ligase orCircLigase™ ssDNA ligase. FIG. 5A is a schematic of the enzymaticligation reaction. The donor oligonucleotide is 5′-phosphorylated andcarries a 3′-fluorescein label, imitating a headpiece with a chemicallibrary at 3′ end. The acceptor oligonucleotide is not phosphorylated.FIG. 5B shows gel electrophoresis analysis of a ligation reaction on an8M urea/15% polyacrylamide gel (PAAG). “SM” refers to fluorescentlylabeled donor, “Product” refers to ligation product, and “Adenylateddonor” refers to 5′App-Donor, as described above. FIG. 5C shows highyield ligation achieved for T4 RNA ligase at high enzyme andoligonucleotide concentrations.

FIGS. 6A-6B represent optimization of PEG molecular weight (FIG. 6A) andconcentration (FIG. 6B) to achieve maximal ligation yield by T4 RNAligase. Reaction conditions are as described above for FIGS. 5A-5C. FIG.6A is graph quantifying the electrophoretic analysis of a ligationreaction with MNA/DNA 15mer donor and acceptor tags after incubation for5 hours or 20 hours with 25% (w/v) PEG having a molecular weight from300 to 20,000 (20K). FIG. 6B shows the effect of concentration onligation after incubation for 18-20 hours in the presence of 5% to 45%(w/v) of PEG4600.

FIGS. 7A-7B show a correlation between ligation efficiency byCircLigase™ (FIG. 7A) and T4 RNA ligase (FIG. 7B) and length of thedonor or acceptor oligonucleotides. FIG. 7A depicts a graph quantifyingthe effect of the acceptor length on ligation yield in the CircLigase™ligation reaction. FIG. 7B depicts a graph and a table quantifying theeffect of nucleotide length of the acceptor and donor MNA/DNA tags onsingle-stranded ligation with T4 RNA ligase. These data represent anaverage of two independent experiments obtained by densitometry offluorescent gels at 450 nm excitation.

FIGS. 8A-8B are LC-MS spectra for a MNA/DNA tag before and afterphosphorylation. Data are shown for 15mer tag 5′-HO-mUAC GTA TAC GACTGmG-OH-3′ (SEQ ID NO: 13) (at 250 μM) before (FIG. 8A) and after (FIG.8B) reaction with T4 polynucleotide kinase (50 units per 5 nmole oftag).

FIG. 9 shows an electrophoretic gel for sequential single-strandedligation of tags A-C. The 3′-terminus included fluorescein to representa library compound (or chemical entity), and the asterisk (*) indicatespurification of the ligated product (or complex) prior tophosphorylation.

FIGS. 10A-10B show schematics of a “chemically co-reactive pair”reaction between donor and acceptor oligonucleotides resulting in a5-atom “short” spacer (FIG. 10A) and a 24-atom “long” spacer (FIG. 10B).

FIGS. 11A-11E show results of reverse transcription (RT) and PCRanalysis of 75mer DNA templates containing a short or a long singlespacer, as depicted in FIGS. 10A-10B. FIG. 11A is a schematic of the RTreaction. LC-MS spectra of the RT were recorded at both 260 nm and 650nm for the control 75mer DNA template (FIG. 11B), the 75mer DNA templatecontaining a single 5-atom (“short”) spacer (FIG. 11C), and the 75merDNA template containing a single 24-atom (“long”) spacer (FIG. 11D).FIG. 11E shows RT-PCR analysis of the control 75mer DNA template(“templ75”), a 75mer DNA template with a 5-atom spacer (“short click”),and a 75mer DNA template with a 24-atom spacer (“long click”).

FIGS. 12A-12G show the results of a chemical ligation reaction between a5′-iodo-modified DNA oligonucleotide and a 3 ‘-phosphorothioate DNAoligonucleotide in the presence or absence of a complementary splintoligonucleotide. FIG. 12A shows an exemplary schematic of the reaction.The 5’-iodo oligonucleotide is labeled with 6-FAM at 3′-terminus, whilethe 3′-phosphorothioate oligonucleotide is labeled with Cy5 at the5′-terminus. FIG. 12B shows a gel electrophoresis analysis of theligation reactions in the presence (+spl) or absence (−spl) of acomplementary splint. CCy5 and CFL indicate visible bands of Cy5 andfluorescein-labeled starting material, respectively. FIG. 12C shows atime course of the splinted ligation reaction under the aboveconditions, which was quantified using Cy5 (635 nm) and fluorescein (450nm) detection. FIG. 12D shows LC-MS analysis of the ligation of CFL andCCy5 in the absence (top, at 260 nm, 495 nm, and 650 nm) and presence(bottom, at 260 nm, 495 nm, and 650 nm) of a splint, where ligationreactions were incubated for seven days. FIG. 12E shows LC-MS analysisof the ligation of CFL and CCy5 in the absence a splint (at 260 nm, 495nm, and 650 nm), where ligation reactions were incubated for eight days.FIG. 12F shows MS analysis of reaction of CFL oligonucleotide withpiperidine, where this reaction was intended to displace iodine.Reaction conditions included oligonucleotides at 100 μM, piperidine at40 mM (400 equivalents) in 100 mM borate buffer, pH 9.5, for 20 hrs atroom temperature (left); and oligonucleotides at 400 μM, piperidine at 2M (4,000 equivalents) in 200 mM borate buffer, pH 9.5, for 2 hrs at 65°C. (right). FIG. 12G shows MS analysis of a splinted ligation reactionof CFL and CCy5 oligonucleotides at 50 μM performed in the presence of400 equivalents of piperidine in 100 mM borate buffer, pH 9.5, for 20hrs at room temperature.

FIGS. 13A-13C shows the use of modified oligonucleotides to minimizeshuffling. FIG. 13A shows an LC-MS analysis of a single-strandedligation reaction of a 5′-phosphorylated headpiece ssHP (3,636 Da) and atag (tag 15; 2,469 Da) having 2′-O methyl nucleotides. The LC-MSanalysis showed three peaks: peak 1 for the tag (2,469 Da); peak 2 forthe adenylated headpiece (3,965 Da); and peak 3 having two (in someinstances three) sub-peaks containing products with molecular weights of6,089 Da (expected ligation product); 5,769 Da (expected 6,089 Da−320Da); and 6,409 Da (expected 6,089 Da+320 Da). This mass difference of320 Da corresponds exactly to either removal or addition of an extra2′-O-Me C nucleotide. FIGS. 13B-1 to 13B-3 show a non-limiting, proposedmechanism of the nucleotide shuffling, where about 90% of the reactionprovides the expected (normal) ligation product and about 10% of thereaction provides aberrant ligation products (“Product −1 nt” and“Product+1 nt”). FIG. 13C shows an LC-MS analysis of ligation ofheadpiece HP-PS with tag 15. The headpiece HP-PS has the sequence theheadpiece ssHP but includes a phosphorothioate linkage at the5′-terminus. LC analysis showed three peaks: peak 1 for the tag (2,469),peak 2 for the adenylated headpiece (3,984), and peak 3 for a singleligation product (6,107) with almost no nucleotide shuffling observed.Traces of +/−320 peaks likely correspond to the oxidative conversion ofthe phosphorothioate linkage into a native phosphodiester linkage or aredue to incomplete sulfurization.

FIG. 14 is a graph showing separation of library members using sizeexclusion chromatography, where target-bound library members (left ongraph) elute at a shorter time than unbound library members (right ongraph).

FIG. 15A is an exemplary schematic showing the chemical ligation ofencoding DNA tags using a single chemistry that is not splint-dependent,e.g. 5′-azido/3′-alkynyl. The reactive groups are present on the 3′ and5′ ends of each tag (Tag A, B, and C), and one of the reactive groups oneither end (for example, the 3′ end) is protected to prevent thecyclization, polymerization, or wrong-cycle ligation of the tags. Thecycle of tag ligation includes chemical ligation, followed bydeprotection of the remaining functional group to render the growingligated entity competent for the next cycle of ligation. Each cycle alsoincludes addition of one or more building blocks (BBA, BBB, and BBC,which are encoded by Tag A, B, and C, respectively). The chemicalligation process can optionally include addition of a tailpiece.

FIG. 15B is an exemplary schematic showing the chemical ligation ofencoding DNA tags using a single chemistry that is splint-dependent. Thetemplate-dependent nature of this approach reduces the frequency ofoccurrence of tag polymerization, tag cyclization, as well as ofmistagging events. Similar to FIG. 15A, this schematic includes tags(Tag A, B, and C) and one or more building blocks encoded by tags (BBA,BBB, and BBC).

FIG. 15C is an exemplary schematic showing the use of a succession ofchemically ligated tags as a template for template-dependentpolymerization, generating cDNA that is competent for PCR amplificationand sequencing, as well as using a template-dependent polymerase capableof reading through the chemically ligated junctions.

FIG. 16A is an exemplary schematic showing the chemical ligation ofencoding DNA tags using TIPS-protected alkynyl tags and “click”chemistry. Each cycle of library synthesis includes Cu(I)-catalyzedchemical ligation of the TIPS-protected tag to the deprotected alkynefrom the previous cycle. After the ligation, the TIPS group is removed(deprotected), thereby activating the alkyne for the next chemicalligation step.

FIG. 16B shows the structure of DMT-succinyl-3′-O-TIPS-propargyl uridineCPG that is used to initiate solid-phase synthesis of oligonucleotidesbearing 3′-O-TIPS-propargyl uridine at the 3′-terminus.

FIG. 16C is an exemplary schematic showing the use of a succession of“click” chemically ligated tags as a template for template-dependentpolymerization, generating cDNA that is competent for PCR amplificationand sequencing, as well as using a template-dependent polymerase capableof reading through the “click” chemically ligated junctions.

FIGS. 17A-17C show the synthesis of 5′-biotinylated, “single-click”templates Y55 and Y185. FIG. 17A provides an exemplary schematic. FIG.17B and FIG. 17C show LC-MS analysis of Y55 and Y185, respectively.

FIGS. 18A-18C provide an exemplary assay for the “read-through” of a“single-click” template. FIG. 18A shows a schematic, where FAM-labeledprimer is annealed to the biotinylated template and is incubated withthe template-dependent polymerase, according to the manufacturer'srecommended conditions. The complexes are subsequently incubated withstreptavidin beads, washed, eluted with NaOH, and then neutralized.After neutralization the samples are analyzed by LC-MS.

FIG. 18B and FIG. 18C show LC-MS data of the Klenow fragment copying oftemplates Y55 and Y185, respectively.

FIGS. 19A-19D provides the synthesis of 5′-biotinylated “double-click”template YDC and “triple-click template” YTC using a TIPS-protectedalkynyl tag. FIGS. 19A and 19B show exemplary schematics for thissynthesis. FIGS. 19C and 19D show LC-MS analysis of the YDC and YTCtemplates respectively.

FIGS. 20A-20C provide an exemplary click “read-through” assay using“double-click” and “triple-click” templates. FIG. 20A is a schematic,where FAM-labeled primer is annealed to the biotinylated template and isincubated with Klenow fragment of E. coli DNA polymerase I according tothe manufacturer's recommended reaction conditions. The complexes areincubated with streptavidin beads, washed, eluted with NaOH, andneutralized. After the neutralization, the samples are assayed by LC-MS.FIGS. 20B and 20C show LC-MS data of the Klenow fragment copying of thetemplates YDC and YTC, respectively.

FIG. 21 is a graph showing the efficiency of the click “read-through”using “single-click”, “double-click” and “triple-click” templates incomparison to a control “no-click” DNA template. These data wereobtained using the “read-through” assay described herein, and the yieldswere measured by LC MS analysis by comparison to an internal standard.

FIGS. 22A-22C provide exemplary schematics of chemical ligation withorthogonal chemistry. FIG. 22A is a schematic of the chemical ligationstrategy for DNA encoding tags that (i) utilizes two successiveorthogonal chemistries for (ii) available read-through strategies. Eachtag contains two orthogonal reactive groups, indicated by differingsymbols for the 5′-terminus and the 3′-terminus of each tag. In eachsuccessive cycle of chemical ligation, an orthogonal chemistry is used.This strategy reduces the frequency of occurrence of mistagging eventsand may also be used without the protection of the reactive terminalgroups. FIG. 22B is a schematic of the template-dependent polymerization“read-though” of a template generated by the orthogonal chemicalligation of orthogonal DNA tags to generate cDNA from which the sequenceof the tags can be deduced. FIG. 22C is the same as FIG. 22B butincludes a self-priming tailpiece, which may be rendered double-strandedby restriction digestion to facilitate strand-separation during PCRamplification.

FIG. 23 is an exemplary schematic showing the chemical ligation strategyfor DNA encoding tags that utilizes two specific successive orthogonalchemistries. Each tag contains click-reactive andphosphorothioate/iodo-reactive groups. Tags bearing orthogonal reactivegroups at their 3′ and 5′ ends cannot polymerize and have a reducedfrequency of occurrence of mistagging events. Without wishing to belimited, this approach may eliminate the need for the TIPS-protection ofthe 3′-alkyne. In cycle A, the 5′-iodo/3′-alkynyl tag is ligated usingsplint-dependent ligation to the 3′-phosphorothioate headpiece, leavinga reactive 3′ alkyne for the next cycle of chemical ligation to a5′-azido/3′-phosphorothioate tag. The orthogonal ligation cycles may berepeated as many times as is desired.

FIGS. 24A-24B show the protection and use of 3′-phosphorothioate/5′-iodogroups on DNA tags. FIG. 24A shows an exemplary schematic for usingprotecting groups (PG) for these tags. FIG. 24B shows an exemplaryscheme for use of 3′-phosphorothioate/5′-iodo tags to chemically ligatesuccession of encoding DNA tags that encode a chemical librarycovalently installed upon the 5′-terminus.

FIGS. 25A-25B show the protection and use of 3′-phosphorothioate groupson DNA tags. FIG. 25A shows the scheme for protection of these groups.FIG. 25B shows the scheme for use of 3′-phosphorothioate/5′-azido and3′-propargyl/5′-iodo tags to chemically ligate a succession oforthogonal encoding DNA tags that encode a chemical library covalentlyinstalled upon the 5′-terminus.

DETAILED DESCRIPTION

The invention features methods of using single-stranded ligation toinstall oligonucleotide tags onto chemical entity-oligonucleotidecomplexes. This method can be used to create diverse libraries ofselectable chemical entities by establishing an encoded relationshipbetween particular tags and particular chemical reactions or buildingblocks. To identify one or more chemical entities, the oligonucleotidetags can be amplified, cloned, sequenced, and correlated by using theestablished relationship. In particular, reaction conditions thatpromote single-stranded ligation of tags were identified. Theseconditions include the use of one or more 2′-substituted nucleotides(e.g., 2′-O-methyl nucleotides or 2′-fluoro nucleotides) within thetags, the use of tags of particular length (e.g., between 5 and 15nucleotides), the use of one or more enzymes (e.g., RNA ligase and/orDNA ligase), and/or the use of one or more agents during ligation (e.g.,poly ethylene glycol and/or a soluble multivalent cation, such asCo(NH₃)₆Cl₃). These methods additionally include methods of chemicallyjoining oligonucleotides, such that the sequence of the joinedoligonucleotide product may be utilized as a template for atemplate-dependent polymerase reaction. Methods of creating and tagginglibraries of these complexes are described in detail below.

Methods for Tagging Encoded Libraries

This invention features a method for operatively linking oligonucleotidetags with chemical entities, such that encoding relationships may beestablished between the sequence of the tag and the structural units (orbuilding blocks) of the chemical entity. In particular, the identityand/or history of a chemical entity can be inferred from the sequence ofbases in the oligonucleotide. Using this method, a library includingdiverse chemical entities or members (e.g., small molecules or peptides)can be addressed with a particular tag sequence.

Generally, these methods include the use of a headpiece, which has atleast one functional group that may be elaborated chemically and atleast one functional group to which a single-stranded oligonucleotidemay be bound (or ligated). Binding can be effectuated by any usefulmeans, such as by enzymatic binding (e.g., ligation with one or more ofan RNA ligase and/or a DNA ligase) or by chemical binding (e.g., by asubstitution reaction between two functional groups, such as anucleophile and a leaving group).

To create numerous chemical entities within the library, a solutioncontaining the headpiece can be divided into multiple aliquots and thenplaced into a multiplicity of physically separate compartments, such asthe wells of a multiwell plate. Generally, this is the “split” step.Within each compartment or well, successive chemical reaction andligation steps are performed with a single-stranded tag within eachaliquot. The relationship between the chemical reaction conditions andthe sequence of the single-stranded tag are recorded. The reaction andligation steps may be performed in any order. Then, the reacted andligated aliquots are combined or “pooled,” and optionally purificationmay be performed at this point. These split and pool steps can beoptionally repeated.

Next, the library can be tested and/or selected for a particularcharacteristic or function, as described herein. For example, themixture of tagged chemical entities can be separated into at least twopopulations, where the first population binds to a particular biologicaltarget and the second population does not. The first population can thenbe selectively captured (e.g., by eluting on a column providing thetarget of interest or by incubating the aliquot with the target ofinterest) and, optionally, further analyzed or tested, such as withoptional washing, purification, negative selection, positive selection,or separation steps.

Finally, the chemical histories of one or more members (or chemicalentities) within the selected population can be determined by thesequence of the operatively linked oligonucleotide. Upon correlating thesequence with the particular building block, this method can identifythe individual members of the library with the selected characteristic(e.g., an increased tendency to bind to the target protein and therebyelicit a therapeutic effect). For further testing and optimization,candidate therapeutic compounds may then be prepared by synthesizing theidentified library members with or without their associatedoligonucleotide tags.

FIGS. 1-3 provide various exemplary methods for tagging libraries usingsingle-stranded ligation with a headpiece, where tags can be ligated onthe 5′-terminus or the 3′-terminus of the headpiece. To control theorder in which the tags are ligated and to reduce side reactions, thesemethods ensure that only one reactive 5′-terminus and one reactive3′-terminus are present during ligation. Furthermore, these exemplarymethods use 2′-substituted nucleotides (e.g., mixed 2′-deoxy/2′-O-methylnucleotides) in the tags, and these tags act as templates for a DNA- orRNA-dependent polymerase capable of polymerizing nucleotides in atemplate-dependent fashion. Without wishing to be limited by theory, theuse of one or more 2′-substituted nucleotides (e.g., 2′-O-methylnucleotides and/or 2′-fluoro nucleotides) within a tag could promoteligation by RNA ligase by more closely resembling RNA, while preservingboth the physical and chemical robustness of the recording medium aswell as the ability to extract sequence information usingtemplate-dependent polymerization.

FIG. 1 provides an exemplary method for reducing side reactions, wherethe ligated complex and tags are designed to avoid unwanted reactionsbetween reactive 3′-OH and 5′-monophosphate (“5′-P”) groups. Inparticular, this scheme depicts the phosphorylation-ligation cycleapproach. During ligation, only one 3′-OH group (in the tag) and one5′-P group (in the headpiece) are available, and, thus, only oneligation event is possible. Following the ligation and purificationsteps, a 5′-OH group is formed in the complex, and this group can beconverted into a 5′-P for adding subsequent oligonucleotide tags. The3′-terminus of the complex is blocked by X, which can be a protectinggroup or a component of a chemical entity (e.g., optionally including alinker that acts as a spacer between the chemical entity and theheadpiece).

As shown in FIG. 1, the exemplary method includes ligation of buildingblock tag 1 (“tag 1”) to the 5′-terminus of the headpiece, therebycreating a complex, and performing successive ligations to the5′-terminus of the complex. The reactive 5′-terminus is a phosphategroup on the complex, and the reactive 3′-terminus is a hydroxyl groupon the tags. After the addition of each tag, the ligated complex isseparated from the unreacted, unligated headpiece and tags and fromother reagents (e.g., phosphate, cobalt, or other reagents presentduring the ligation step). Separation can be accomplished by any usefulmethod (e.g., by chromatographic or electrophoretic separation ofligated and non-ligated products or by precipitation of a reagent).Then, the ligated complex is exposed to an agent (e.g., a polynucleotidekinase or a chemical phosphorylating agent) to form a phosphate group onthe 5′-terminus of the complex. The separation and phosphorylation stepsmay be performed in either order. In particular, if a kinase is used inthe phosphorylation step, the kinase should be inactivated or removedprior to the addition of the subsequent tags that may also contain a5′-OH group, or any reagents that can inhibit the kinase should beremoved from the reaction mixture prior to the phosphorylation step.

In another embodiment, the method includes binding successive tags fromthe 3′-terminus of the preceding ligated complex. In this method, theligated complex lacks a reactive 3′-OH group immediately after theligation step but contains a group that can be converted into a 3′-OHgroup (e.g., by release of a protecting group). FIG. 2A provides aschematic showing an exemplary method for tagging the 3′-terminus of acomplex, and FIG. 2B provides an exemplary reaction scheme for aprotected 3′-terminus that contains convertible 3′-OH group upon releaseof the 3′-linked protecting group. As shown in FIG. 2A, building blocktag 1 (“tag 1”) has a 3′-protected group. In the first step, theexemplary method includes ligation of the tag to the 3′-terminus of theheadpiece, thereby creating a complex. Successive ligations areperformed to the 3′-terminus of the complex. The reactive 5′-terminus isa phosphate group on the tag, and the reactive 3′-terminus is a hydroxylgroup on the complex. After the addition of each tag, the ligatedcomplex is deprotected (e.g., by the addition of a hydrolyzing agent) torelease the 3′-protecting group.

In yet another embodiment, the method includes binding successive tagsby using a 5′-preadenylated (5′-App) oligonucleotide and a ligase (e.g.,T4 RNA ligase). In the presence of ATP, T4 RNA ligase will use the ATPcofactor to form an adenylated intermediate prior to ligation. In theabsence of ATP, T4 RNA ligase will only ligate preadenylatedoligonucleotides, and possible side reactions with 5′-P oligonucleotideswill not occur. Thus, single-stranded ligation with reduced sidereactions can be performed with a chemically synthesized 5′-Appoligonucleotide in the presence of 5′-monophosphorylated tag, where the5′-App oligonucleotide can be ligated to a headpiece prior to tagging orto a complex formed after multiple rounds of tagging.

FIG. 3 provides a schematic showing an exemplary method for tagging the5′-terminus of a preadenylated headpiece. Adenylation of the donornucleotide at the 5′-phosphate group is the first step in the ligationreaction, and this reaction generally requires one molecule of ATP. Inthe second step, the 3′-OH group of the acceptor oligonucleotide reactswith the adenylated donor and forms a diester bond between twooligonucleotides, thus releasing one AMP molecule. The chemicallyadenylated 5′-phosphate group of the donor oligonucleotide imitates aproduct of the first step of the ligation reaction and can be ligated tothe second oligonucleotide in the absence of ATP. In the followingscheme, a 5′-App headpiece is ligated to the 3′-OH group of a5′-phosphorylated oligonucleotide tag (labeled “Tag A”). Due to thepresence of the adenylated 5′-terminus of the oligonucleotide, ligationcan occur in the absence of ATP. Under these conditions, the5′-phosphate group of Tag A does not serve as a ligation donor. Buildingblock Tag B can be ligated by providing a nucleotide having a 3′-OHterminus (labeled “Tag B”) in the presence of ATP, and additional tags(labeled “Tag C”) can be included.

In FIG. 3, the 3′-terminus of the headpiece can be blocked with anyprotecting group (e.g., an irreversible protecting group, such as ddN,or a reversible protecting group). In the first step, the methodincludes ligation of the tag to the 5′-terminus of the headpiece in theabsence of ATP, thereby creating a complex. Successive ligations areperformed to the 5′-terminus of the complex in the presence of ATP. Thismethod can be modified in order to perform successive ligation to the3′-terminus of a complex. For example, the method can include the use ofa 5′-preadenylated tag and a headpiece having a reactive 3′-OH terminus.This method may further require blocking the 3′-terminus of the tag toavoid cross-reactions between tags, such as the method described aboveand in FIG. 2.

The general method provided in FIG. 3 can be modified by replacing theprimer with a headpiece. In this case, the headpiece has to beadenylated chemically at the 5′-terminus, and Tag A is phosphorylated at5′-terminus. Ligation of this phosphorylated Tag A to the adenylatedheadpiece occurs in the same standard conditions, described herein, butomitting ATP. By using this ligation condition, the ligation ofphosphorylated 5′ terminus can be prevented. In the next step, ligationof Tag B requires that this tag have a free hydroxyl group at5′-terminus (i.e., non-phosphorylated). Successive ligation reactionscan be performed in the presence of ATP, followed by phosphorylation ofthe 5′-terminus of the resulting oligonucleotide if further extension ofthe tags (e.g., Tag C in FIG. 3) is desired.

The methods described herein can include any number of optional steps todiversify the library or to interrogate the members of the library. Forany tagging method described herein (e.g., as in FIGS. 1-3), successive“n” number of tags can be added with additional “n” number of ligation,separation, and/or phosphorylation steps. Exemplary optional stepsinclude restriction of library members using one or more restrictionendonucleases; ligation of one or more adapter sequences to one or bothof the library termini, e.g., such as one or more adapter sequences toprovide a priming sequence for amplification and sequencing or toprovide a label, such as biotin, for immobilization of the sequence;reverse-transcription or transcription, optionally followed byreverse-transcription, of the assembled tags in the complex using areverse transcriptase, transcriptase, or another template-dependentpolymerase; amplification of the assembled tags in the complex using,e.g., PCR; generation of clonal isolates of one or more populations ofassembled tags in the complex, e.g., by use of bacterial transformation,emulsion formation, dilution, surface capture techniques, etc.;amplification of clonal isolates of one or more populations of assembledtag in the complex, e.g., by using clonal isolates as templates fortemplate-dependent polymerization of nucleotides; and sequencedetermination of clonal isolates of one or more populations of assembledtags in the complex, e.g., by using clonal isolates as templates fortemplate-dependent polymerization with fluorescently labelednucleotides. Additional methods for amplifying and sequencing theoligonucleotide tags are described herein.

These methods can be used to identify and discover any number ofchemical entities with a particular characteristic or function, e.g., ina selection step. The desired characteristic or function may be used asthe basis for partitioning the library into at least two parts with theconcomitant enrichment of at least one of the members or related membersin the library with the desired function. In particular embodiments, themethod comprises identifying a small drug-like library member that bindsor inactivates a protein of therapeutic interest. In another embodiment,a sequence of chemical reactions is designed, and a set of buildingblocks is chosen so that the reaction of the chosen building blocksunder the defined chemical conditions will generate a combinatorialplurality of molecules (or a library of molecules), where one or moremolecules may have utility as a therapeutic agent for a particularprotein. For example, the chemical reactions and building blocks arechosen to create a library having structural groups commonly present inkinase inhibitors. In any of these instances, the tags encode thechemical history of the library member and, in each case, a collectionof chemical possibilities may be represented by any particular tagcombination.

In one embodiment, the library of chemical entities, or a portionthereof, is contacted with a biological target under conditions suitablefor at least one member of the library to bind to the target, followedby removal of library members that do not hind to the target, andanalyzing the one or more oligonucleotide tags associated with them.This method can optionally include amplifying the tags by methods knownin the art. Exemplary biological targets include enzymes (e.g., kinases,phosphatases, methylases, demethylases, proteases, and DNA repairenzymes), proteins involved in protein:protein interactions (e.g.,ligands for receptors), receptor targets (e.g., GPCRs and RTKs), ionchannels, bacteria, viruses, parasites, DNA, RNA, prions, andcarbohydrates.

In another embodiment, the chemical entities that bind to a target arenot subjected to amplification but are analyzed directly. Exemplarymethods of analysis include microarray analysis, including evanescentresonance photonic crystal analysis; bead-based methods fordeconvoluting tags (e.g., by using his-tags); label-free photoniccrystal biosensor analysis (e.g., a BIND® Reader from SRU Biosystems,Inc., Woburn, Mass.); or hybridization-based approaches (e.g. by usingarrays of immobilized oligonucleotides complementary to sequencespresent in the library of tags).

In addition, chemically co-reactive pairs (or functional groups) can bereadily included in solid-phase oligonucleotide synthesis schemes andwill support the efficient chemical ligation of oligonucleotides. Inaddition, the resultant ligated oligonucleotides can act as templatesfor template-dependent polymerization with one or more polymerases.Accordingly, any of the binding steps described herein for taggingencoded libraries can be modified to include one or more of enzymaticligation and/or chemical ligation techniques. Exemplary ligationtechniques include enzyme ligation, such as use of one of more RNAligases and/or DNA ligases; and chemical ligation, such as use ofchemically co-reactive pairs (e.g., a pair including optionallysubstituted alkynyl and azido functional groups).

Furthermore, one or more libraries can be combined in a split-and-mixstep. In order to permit mixing of two or more libraries, the librarymember may contain one or more library-identifying sequences, such as ina library-identifying tag, in a ligated building block tag, or as partof the headpiece sequence, as described herein.

Methods Having Reduced Mass

Much of the motivation for single-stranded encoding strategies arisesfrom the reduced mass of a single-stranded tag when compared to adouble-stranded tag. Reduced mass potentially confers several benefitsincluding increased solubility, decreased cost, increased reactivity,increased target accessibility, decreased hydrodynamic radius, increasedaccuracy of analytical assessments, etc. In addition to using asingle-stranded tagging methodology, further reductions in mass can beachieved by including the use of one or more of the following: one ormore tags having a reduced length, constant mass tag sets, an encodingheadpiece, one or more members of a library lacking a primer bindingregion and/or a constant region, one or more members of a library havinga reduced constant region, or any other methodologies described herein.

To minimize the mass of the members in the library, the length of one ormore building block tags can be reduced, such as to a length that is asshort as possible to encode each split size. In particular, the tags canbe less than 20 nucleotides (e.g., less than 19 nucleotides, less than18 nucleotides, less than 17 nucleotides, less than 16 nucleotides, lessthan 15 nucleotides, less than 14 nucleotides, less than 13 nucleotides,less than 12 nucleotides, less than 11 nucleotides, less than 10nucleotides, less than 9 nucleotides, less than 8 nucleotides, or lessthan 7 nucleotides). As described below in the Examples, shorter tags(e.g, about 10 nucleotides or shorter) can be used for tag ligation.

Constant mass strategies can also be used, which could aid in analysisduring library synthesis. In addition, constant mass tag sets couldpermit the recognition of all single error occurrences (e.g., errorsarising from misreading a sequence or from chemical or enzymaticligation of a tag) and most multiple error occurrences. The relationshipbetween the length of a constant mass single-stranded tag set andencoding ability (e.g., minimum lengths to support specific buildingblock split sizes or library identities, etc.) is outlined below inTable 1. Accordingly, use of constant mass tag sets could be used toprovide beneficial encoding ability, while maintaining error recognitionduring library formation.

TABLE 1 Length Base #1 Base #2 Base #3 Base #4 Combinations 1 1 0 0 0 12 1 1 0 0 2 3 1 1 1 0 6 4 1 1 1 1 24 5 2 1 1 1 60 6 2 2 1 1 180 7 2 2 21 630 8 2 2 2 2 2,520 9 3 2 2 2 7,560 10 3 3 2 2 25,200 11 3 3 3 292,400 12 3 3 3 3 369,600 13 4 3 3 3 1,201,200 14 4 4 3 3 4,204,200 15 44 4 3 15,765,750 16 4 4 4 4 63,063,000 17 5 4 4 4 214,414,200 18 5 5 4 4771,891,120 19 5 5 5 4 2,933,186,256 20 5 5 5 5 11,732,745,024

To minimize mass in the library, the headpiece can be used not only tolink the chemical moiety and a tag but to also encode for the identityof a particular library or for a particular step. For example, theheadpiece can encode information, e.g., a plurality of headpieces thatencode the first split(s) or the identity of the library, such as byusing a particular sequence related to a specific library.

In addition, primer binding (e.g., constant) regions from the library ofDNA-encoded chemical entities can be excluded during the selectionstep(s). Then, these regions can be added after selection by, e.g.,single-stranded ligation. One exemplary strategy would include providinga chemical entity at the 5′-terminus of a encoding oligonucleotide,selecting a particular chemical entity based on any useful particularcharacteristic or function, and ligating a tailpiece oligonucleotide tothe 3′-terminus of the encoding oligonucleotide that includes a primerbinding sequence and may optionally contain one or more tags, e.g. a“use” tag, an “origin” tag, etc., as described herein. This primerbinding sequence could then be used to initiate template-dependentpolymerization to generate cDNA (or cRNA) that is complementary to theselected library member. The cDNA or cRNA would then be ligated at its3′-terminus to an oligonucleotide that contains a primer bindingsequence and, now that the encoding information is flanked on both sidesby primer binding sequences, the oligonucleotide may be sequenced and/oramplified using established approaches, such as any described herein.

Mass may further be minimized by omitting or reducing the size of one ormore constant sequences that separate encoding tags. Single-strandedligation requires no complementary relationship between the ends to beligated or between these ends and a splint. Therefore, no fixed sequenceis required to support enzymatic ligation. Short fixed regions betweentags may be useful for informatic parsing of tags or other in silicodeconvolution processes.

Oligonucleotide Tags

The oligonucleotide tags described herein (e.g., a building block tag ora portion of a headpiece) can be used to encode any useful information,such as a molecule, a portion of a chemical entity, the addition of acomponent (e.g., a scaffold or a building block), a headpiece in thelibrary, the identity of the library, the use of one or more librarymembers (e.g., use of the members in an aliquot of a library), and/orthe origin of a library member (e.g., by use of an origin sequence).

Any sequence in an oligonucleotide can be used to encode anyinformation. Thus, one oligonucleotide sequence can serve more than onepurpose, such as to encode two or more types of information or toprovide a starting oligonucleotide that also encodes for one or moretypes of information. For example, the first building block tag canencode for the addition of a first building block, as well as for theidentification of the library. In another example, a headpiece can beused to provide a starting oligonucleotide that operatively links achemical entity to a building block tag, where the headpieceadditionally includes a sequence that encodes for the identity of thelibrary (i.e., the library-identifying sequence). Accordingly, any ofthe information described herein can be encoded in separateoligonucleotide tags or can be combined and encoded in the sameoligonucleotide sequence (e.g., an oligonucleotide tag, such as abuilding block tag, or a headpiece).

A building block sequence encodes for the identity of a building blockand/or the type of binding reaction conducted with a building block.This building block sequence is included in a building block tag, wherethe tag can optionally include one or more types of sequence describedbelow (e.g., a library-identifying sequence, a use sequence, and/or anorigin sequence).

A library-identifying sequence encodes for the identity of a particularlibrary. In order to permit mixing of two or more libraries, a librarymember may contain one or more library-identifying sequences, such as ina library-identifying tag (i.e., an oligonucleotide including alibrary-identifying sequence), in a ligated building block tag, in apart of the headpiece sequence, or in a tailpiece sequence. Theselibrary-identifying sequences can be used to deduce encodingrelationships, where the sequence of the tag is translated andcorrelated with chemical (synthesis) history information. Accordingly,these library-identifying sequences permit the mixing of two or morelibraries together for selection, amplification, purification,sequencing, etc.

A use sequence encodes the history (i.e., use) of one or more librarymembers in an individual aliquot of a library. For example, separatealiquots may be treated with different reaction conditions, buildingblocks, and/or selection steps. In particular, this sequence may be usedto identify such aliquots and deduce their history (use) and therebypermit the mixing together of aliquots of the same library withdifferent histories (uses) (e.g., distinct selection experiments) forthe purposes of the mixing together of samples together for selection,amplification, purification, sequencing, etc. These use sequences can beincluded in a headpiece, a tailpiece, a building block tag, a use tag(i.e., an oligonucleotide including a use sequence), or any other tagdescribed herein (e.g., a library-identifying tag or an origin tag).

An origin sequence is a degenerate (random) oligonucleotide sequence ofany useful length (e.g., about six oligonucleotides) that encodes forthe origin of the library member. This sequence serves to stochasticallysubdivide library members that are otherwise identical in all respectsinto entities distinguishable by sequence information, such thatobservations of amplification products derived from unique progenitortemplates (e.g., selected library members) can be distinguished fromobservations of multiple amplification products derived from the sameprogenitor template (e.g., a selected library member). For example,after library formation and prior to the selection step, each librarymember can include a different origin sequence, such as in an origintag. After selection, selected library members can be amplified toproduce amplification products, and the portion of the library memberexpected to include the origin sequence (e.g., in the origin tag) can beobserved and compared with the origin sequence in each of the otherlibrary members. As the origin sequences are degenerate, eachamplification product of each library member should have a differentorigin sequence. However, an observation of the same origin sequence inthe amplification product could indicate a source of error, such as anamplification error or a cyclization error in the sequence that producesrepeated sequences, and the starting point or source of these errors canbe traced by observing the origin sequence at each step (e.g., at eachselection step or amplification step) of using the library. These originsequences can be included in a headpiece, a tailpiece, a building blocktag, an origin tag (i.e., an oligonucleotide including an originsequence), or any other tag described herein (e.g., alibrary-identifying tag or a use tag).

Any of the types of sequences described herein can be included in theheadpiece. For example, the headpiece can include one or more of abuilding block sequence, a library-identifying sequence, a use sequence,or an origin sequence.

Any of these sequences described herein can be included in a tailpiece.For example, the tailpiece can include one or more of alibrary-identifying sequence, a use sequence, or an origin sequence.

These sequences can include any modification described herein foroligonucleotides, such as one or more modifications that promotesolubility in organic solvents (e.g., any described herein, such as forthe headpiece), that provide an analog of the natural phosphodiesterlinkage (e.g., a phosphorothioate analog), or that provide one or morenon-natural oligonucleotides (e.g., 2′-substituted nucleotides, such as2′-O-methylated nucleotides and 2′-fluoro nucleotides, or any describedherein).

These sequences can include any characteristics described herein foroligonucleotides. For example, these sequences can be included in tagthat is less than 20 nucleotides (e.g., as described herein). In otherexamples, the tags including one or more of these sequences have aboutthe same mass (e.g., each tag has a mass that is about +/−10% from theaverage mass between two or more tags); lack a primer binding (e.g.,constant) region; lack a constant region; or have a constant region ofreduced length (e.g., a length less than 30 nucleotides, less than 25nucleotides, less than 20 nucleotides, less than 19 nucleotides, lessthan 18 nucleotides, less than 17 nucleotides, less than 16 nucleotides,less than 15 nucleotides, less than 14 nucleotides, less than 13nucleotides, less than 12 nucleotides, less than 11 nucleotides, lessthan 10 nucleotides, less than 9 nucleotides, less than 8 nucleotides,or less than 7 nucleotides).

Sequencing strategies for libraries and oligonucleotides of this lengthmay optionally include concatenation or catenation strategies toincrease read fidelity or sequencing depth, respectively. In particular,the selection of encoded libraries that lack primer binding regions hasbeen described in the literature for SELEX, such as described in Jaroschet al., Nucleic Acids Res. 34: e86 (2006), which is incorporated hereinby reference. For example, a library member can be modified (e.g., aftera selection step) to include a first adapter sequence on the 5′-terminusof the complex and a second adapter sequence on the 3′-terminus of thecomplex, where the first sequence is substantially complementary to thesecond sequence and result in forming a duplex. To further improveyield, two fixed dangling nucleotides (e.g., CC) are added to the5′-terminus. In particular embodiments, the first adapter sequence is5′-GTGCTGC-3′ (SEQ ID NO: 1), and the second adapter sequence is5′-GCAGCACCC-3′ (SEQ ID NO: 2).

Headpiece

In the library, the headpiece operatively links each chemical entity toits encoding oligonucleotide tag. Generally, the headpiece is a startingoligonucleotide having two functional groups that can be furtherderivatized, where the first functional group operatively links thechemical entity (or a component thereof) to the headpiece and the secondfunctional group operatively links one or more tags to the headpiece. Alinker can optionally be used as a spacer between the headpiece and thechemical entity.

The functional groups of the headpiece can be used to form a covalentbond with a component of the chemical entity and another covalent bondwith a tag. The component can be any part of the small molecule, such asa scaffold having diversity nodes or a building block. Alternatively,the headpiece can be derivatized to provide a linker (i.e., a spacerseparating the headpiece from the small molecule to be formed in thelibrary) terminating in a functional group (e.g., a hydroxyl, amine,carboxyl, sulfhydryl, alkynyl, azido, or phosphate group), which is usedto form the covalent linkage with a component of the chemical entity.The linker can be attached to the 5′-terminus, at one of the internalpositions, or to the 3′-terminus of the headpiece. When the linker isattached to one of the internal positions, the linker can be operativelylinked to a derivatized base (e.g., the C5 position of uridine) orplaced internally within the oligonucleotide using standard techniquesknown in the art. Exemplary linkers are described herein.

The headpiece can have any useful structure. The headpiece can be, e.g.,1 to 100 nucleotides in length, preferably 5 to 20 nucleotides inlength, and most preferably 5 to 15 nucleotides in length. The headpiececan be single-stranded or double-stranded and can consist of natural ormodified nucleotides, as described herein. Particular exemplaryembodiments of the headpiece are described in FIGS. 4A-4D. For example,the chemical moiety can be operatively linked to the 3′-terminus (FIG.4A) or 5′-terminus (FIG. 4B) of the headpiece. In particularembodiments, the headpiece includes a hairpin structure formed bycomplementary bases within the sequence. For example, the chemicalmoiety can be operatively linked to the internal position (FIG. 4C), the3′-terminus (FIG. 4D), or the 5′-terminus of the headpiece.

Generally, the headpiece includes a non-complementary sequence on the5′- or 3′-terminus that allows for binding an oligonucleotide tag bypolymerization, enzymatic ligation, or chemical reaction. In FIG. 4E,the exemplary headpiece allows for ligation of oligonucleotide tags(labeled 1-4), and the method includes purification and phosphorylationsteps. After the addition of tag 4, an additional adapter sequence canbe added to the 5′-terminus of tag 4. Exemplary adapter sequencesinclude a primer binding sequence or a sequence having a label (e.g.,biotin). In cases where many building blocks and corresponding tags areused (e.g., 100 tags), a mix-and-split strategy may be employed duringthe oligonucleotide synthesis step to create the necessary number oftags. Such mix-and-split strategies for DNA synthesis are known in theart. The resultant library members can be amplified by PCR followingselection for binding entities versus a target(s) of interest.

The headpiece or the complex can optionally include one or more primerbinding sequences. For example, the headpiece has a sequence in the loopregion of the hairpin that serves as a primer binding region foramplification, where the primer binding region has a higher meltingtemperature for its complementary primer (e.g., which can includeflanking identifier regions) than for a sequence in the headpiece. Inother embodiments, the complex includes two primer binding sequences(e.g., to enable a PCR reaction) on either side of one or more tags thatencode one or more building blocks. Alternatively, the headpiece maycontain one primer binding sequence on the 5′- or 3′-terminus. In otherembodiments, the headpiece is a hairpin, and the loop region forms aprimer binding site or the primer binding site is introduced throughhybridization of an oligonucleotide to the headpiece on the 3′ side ofthe loop. A primer oligonucleotide, containing a region homologous tothe 3′-terminus of the headpiece and carrying a primer binding region onits 5′-terminus (e.g., to enable a PCR reaction) may be hybridized tothe headpiece and may contain a tag that encodes a building block or theaddition of a building block. The primer oligonucleotide may containadditional information, such as a region of randomized nucleotides,e.g., 2 to 16 nucleotides in length, which is included forbioinformatics analysis.

The headpiece can optionally include a hairpin structure, where thisstructure can be achieved by any useful method. For example, theheadpiece can include complementary bases that form intermolecular basepairing partners, such as by Watson-Crick DNA base pairing (e.g.,adenine-thymine and guanine-cytosine) and/or by wobble base pairing(e.g., guanine-uracil, inosine-uracil, inosine-adenine, andinosine-cytosine). In another example, the headpiece can includemodified or substituted nucleotides that can form higher affinity duplexformations compared to unmodified nucleotides, such modified orsubstituted nucleotides being known in the art. In yet another example,the headpiece includes one or more crosslinked bases to form the hairpinstructure. For example, bases within a single strand or bases indifferent double strands can be crosslinked, e.g., by using psoralen.

The headpiece or complex can optionally include one or more labels thatallow for detection. For example, the headpiece, one or moreoligonucleotide tags, and/or one or more primer sequences can include anisotope, a radioimaging agent, a marker, a tracer, a fluorescent label(e.g., rhodamine or fluorescein), a chemiluminescent label, a quantumdot, and a reporter molecule (e.g., biotin or a his-tag).

In other embodiments, the headpiece or tag may be modified to supportsolubility in semi-, reduced-, or non-aqueous (e.g., organic)conditions. Nucleotide bases of the headpiece or tag can be renderedmore hydrophobic by modifying, for example, the C5 positions of T or Cbases with aliphatic chains without significantly disrupting theirability to hydrogen bond to their complementary bases. Exemplarymodified or substituted nucleotides are5′-dimethoxytrityl-N4-diisobutylaminomethylidene-5-(1-propynyl)-2′-deoxycytidine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;5′-dimethoxytrityl-5-(1-propynyl)-2′-deoxyuridine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;5′-dimethoxytrityl-5-fluoro-2′-deoxyuridinc,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;and 5′-dimethoxytrityl-5-(pyren-1-yl-ethynyl)-2′-deoxyuridine, or3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite.

In addition, the headpiece oligonucleotide can be interspersed withmodifications that promote solubility in organic solvents. For example,azobenzene phosphoramidite can introduce a hydrophobic moiety into theheadpiece design. Such insertions of hydrophobic amidites into theheadpiece can occur anywhere in the molecule. However, the insertioncannot interfere with subsequent tagging using additional DNA tagsduring the library synthesis or ensuing PCR once a selection is completeor microarray analysis, if used for tag deconvolution. Such additions tothe headpiece design described herein would render the headpiece solublein, for example, 15%, 25%, 30%, 50%, 75%, 90%, 95%, 98%, 99%, or 100%organic solvent. Thus, addition of hydrophobic residues into theheadpiece design allows for improved solubility in semi- or non-aqueous(e.g., organic) conditions, while rendering the headpiece competent foroligonucleotide tagging. Furthermore, DNA tags that are subsequentlyintroduced into the library can also be modified at the C5 position of Tor C bases such that they also render the library more hydrophobic andsoluble in organic solvents for subsequent steps of library synthesis.

In particular embodiments, the headpiece and the first building blocktag can be the same entity, i.e., a plurality of headpiece-tag entitiescan be constructed that all share common parts (e.g., a primer bindingregion) and all differ in another part (e.g., encoding region). Thesemay be utilized in the “split” step and pooled after the event they areencoding has occurred.

In particular embodiments, the headpiece can encode information, e.g.,by including a sequence that encodes the first split(s) step or asequence that encodes the identity of the library, such as by using aparticular sequence related to a specific library.

Enzymatic Ligation and Chemical Ligation Techniques

Various ligation techniques can be used to add scaffolds, buildingblocks, linkers, building block tags, and/or the headpiece to produce acomplex. Accordingly, any of the binding steps described herein caninclude any useful ligation techniques, such as enzyme ligation and/orchemical ligation. These binding steps can include the addition of oneor more building block tags to the headpiece or complex; the addition ofa linker to the headpiece; and the addition of one or more scaffolds orbuilding blocks to the headpiece or complex. In particular embodiments,the ligation techniques used for any oligonucleotide provide a resultantproduct that can be transcribed and/or reverse transcribed to allow fordecoding of the library or for template-dependent polymerization withone or more DNA or RNA polymerases.

Generally, enzyme ligation produces an oligonucleotide having a nativephosphodiester bond that can be transcribed and/or reverse transcribed.Exemplary methods of enzyme ligation are provided herein and include theuse of one or more RNA or DNA ligases, such as T4 RNA ligase, T4 DNAligase, CircLigase™ ssDNA ligase, CircLigase™ II ssDNA ligase, andThermoPhage™ ssDNA ligase (Prokazyme Ltd., Reykjavik, Iceland).

Chemical ligation can also be used to produce oligonucleotides capableof being transcribed or reverse transcribed. One benefit of chemicalligation is that solid phase synthesis of such oligonucleotides can beoptimized to support efficient ligation yield. However, the efficacy ofa chemical ligation technique to provide oligonucleotides capable ofbeing transcribed or reverse transcribed may need to be tested. Thisefficacy can be tested by any useful method, such as liquidchromatography-mass spectrometry, RT-PCR analysis, and/or PCR analysis.Examples of these methods are provided in Example 5.

In particular embodiments, chemical ligation includes the use of one ormore chemically co-reactive pairs to provide a spacer that can betranscribed or reverse transcribed. In particular, reactions suitablefor chemically co-reactive pairs are preferred candidates for thecyclization process (Kolb et al., Angew. Chem. Int. Ed., 40:2004-2021(2001); Van der Eycken et al., QSAR Comb. Sci., 26:1115-1326 (2007)).Exemplary chemically co-reactive pairs are a pair including anoptionally substituted alkynyl group and an optionally substituted azidogroup to form a triazole spacer via a Huisgen 1,3-dipolar cycloadditionreaction; an optionally substituted diene having a 4π electron system(e.g., an optionally substituted 1,3-unsaturated compound, such asoptionally substituted 1,3-butadiene,1-methoxy-3-trimethylsilyloxy-1,3-butadiene, cyclopentadiene,cyclohexadiene, or furan) and an optionally substituted dienophile or anoptionally substituted heterodienophile having a 2π electron system(e.g., an optionally substituted alkenyl group or an optionallysubstituted alkynyl group) to form a cycloalkenyl spacer via aDiels-Alder reaction; a nucleophile (e.g., an optionally substitutedamine or an optionally substituted thiol) with a strained heterocyclylelectrophile (e.g., optionally substituted epoxide, aziridine,aziridinium ion, or episulfonium ion) to form a heteroalkyl spacer via aring opening reaction; a phosphorothioate group with an iodo group, suchas in a splinted ligation of an oligonucleotide containing 5′-iodo dTwith a 3′-phosphorothioate oligonucleotide; and an aldehyde group and anamino group, such as a reaction of a 3′-aldehyde-modifiedoligonucleotide, which can optionally be obtained by oxidizing acommercially available 3′-glyceryl-modified oligonucleotide, with5′-amino oligonucleotide (i.e., in a reductive amination reaction) or a5′-hydrazido oligonucleotide.

In other embodiments, chemical ligation includes introducing an analogof the phosphodiester bond, e.g., for post-selection PCR analysis andsequencing. Exemplary analogs of a phosphodiester include aphosphorothioate linkage (e.g., as introduced by use of aphosphorothioate group and a leaving group, such as an iodo group), aphosphoramide linkage, or a phosphorodithioate linkage (e.g., asintroduced by use of a phosphorodithioate group and a leaving group,such as an iodo group).

Reaction Conditions to Promote Enzymatic Ligation or Chemical Ligation

The invention also features one or more reaction conditions that promoteenzymatic or chemical ligation between the headpiece and a tag orbetween two tags. These reaction conditions include using modifiednucleotides within the tag, as described herein; using donor tags andacceptor tags having different lengths and varying the concentration ofthe tags; using different types of ligases, as well as combinationsthereof (e.g., CircLigase™ DNA ligase and/or T4 RNA ligase), and varyingtheir concentration; using poly ethylene glycols (PEGs) having differentmolecular weights and varying their concentration; use of non-PEGcrowding agents (e.g., betaine or bovine serum albumin); varying thetemperature and duration for ligation; varying the concentration ofvarious agents, including ATP, Co(NH₃)₆Cl₃, and yeast inorganicpyrophosphate; using enzymatically or chemically phosphorylatedoligonucleotide tags; using 3′-protected tags; and using preadenylatedtags. These reaction conditions also include chemical ligations.

The headpiece and/or tags can include one or more modified orsubstituted nucleotides. In preferred embodiments, the headpiece and/ortags include one or more modified or substituted nucleotides thatpromote enzymatic ligation, such as 2′-O-methyl nucleotides (e.g.,2′-O-methyl guanine or 2′-O-methyl uracil), 2′-fluoro nucleotides, orany other modified nucleotides that are utilized as a substrate forligation. Alternatively, the headpiece and/or tags are modified toinclude one or more chemically reactive groups to support chemicalligation (e.g. an optionally substituted alkynyl group and an optionallysubstituted azido group). Optionally, the tag oligonucleotides arefunctionalized at both termini with chemically reactive groups, and,optionally, one of these termini is protected, such that the groups maybe addressed independently and side-reactions may be reduced (e.g.,reduced polymerization side-reactions).

Enzymatic ligation can include one or more ligases. Exemplary ligasesinclude CircLigase™ ssDNA ligase (EPICENTRE Biotechnologies, Madison,Wis.), CircLigase™ II ssDNA ligase (also from EPICENTREBiotechnologies), ThermoPhage™ ssDNA ligase (Prokazyme Ltd., Reykjavik,Iceland), T4 RNA ligase, and T4 DNA ligase. In preferred embodiments,ligation includes the use of an RNA ligase or a combination of an RNAligase and a DNA ligase. Ligation can further include one or moresoluble multivalent cations, such as Co(NH₃)₆Cl₃, in combination withone or more ligases.

Before or after the ligation step, the complex can be purified for threereasons. First, the complex can be purified to remove unreactedheadpiece or tags that may result in cross-reactions and introduce“noise” into the encoding process. Second, the complex can be purifiedto remove any reagents or unreacted starting material that can inhibitor lower the ligation activity of a ligase. For example, phosphate mayresult in lowered ligation activity. Third, entities that are introducedinto a chemical or ligation step may need to be removed to enable thesubsequent chemical or ligation step. Methods of purifying the complexare described herein.

Enzymatic and chemical ligation can include poly ethylene glycol havingan average molecular weight of more than 300 Daltons (e.g., more than600 Daltons, 3,000 Daltons, 4,000 Daltons, or 4,500 Daltons). Inparticular embodiments, the poly ethylene glycol has an averagemolecular weight from about 3,000 Daltons to 9,000 Daltons (e.g., from3,000 Daltons to 8,000 Daltons, from 3,000 Daltons to 7,000 Daltons,from 3,000 Daltons to 6,000 Daltons, and from 3,000 Daltons to 5,000Daltons). In preferred embodiments, the poly ethylene glycol has anaverage molecular weight from about 3,000 Daltons to about 6,000 Daltons(e.g., from 3,300 Daltons to 4,500 Daltons, from 3,300 Daltons to 5,000Daltons, from 3,300 Daltons to 5,500 Daltons, from 3,300 Daltons to6,000 Daltons, from 3,500 Daltons to 4,500 Daltons, from 3,500 Daltonsto 5,000 Daltons, from 3,500 Daltons to 5,500 Daltons, and from 3,500Daltons to 6,000 Daltons, such as 4,600 Daltons). Poly ethylene glycolcan be present in any useful amount, such as from about 25% (w/v) toabout 35% (w/v), such as 30% (w/v).

In a preferred embodiment of this invention, the building block tags areinstalled by ligation of a single-stranded oligonucleotide to asingle-stranded oligonucleotide using the ligation protocol outlinedbelow:

-   -   Headpiece: 25 μM (5′ terminus: 5′-monophospho/2′-OMe G,        intervening nucleotides: 2′-deoxy, and 3′ terminus:        2′-blocked/3′-blocked)    -   Building Block Tag: 25 μM (5′-terminus: 2′-OMe/5′-OH G,        intervening nucleotides: 2′-deoxy, and 3′-terminus:        3′-OH/2′-OMe)    -   Co(NH₃)₆Cl₃: 1 mM    -   PEG 4600: 30% (w/v)    -   T4 RNA Ligase (Promega): 1.5 units/μl    -   Yeast Inorganic Pyrophosphatase: 0.0025 units/μl    -   Tris: 50 mM    -   MgCl₂: 10 mM    -   ATP: 1 mM    -   pH: 7.5    -   Water: Balance        In further embodiments, the protocol includes incubation at        37° C. for 20 hours. For the purposes of actual library        construction, higher concentration of headpiece, tags, and/or        ligase may be used, and such modifications to these        concentrations would be apparent to those skilled in the art.        Methods for Encoding Chemical Entities within a Library

The methods of the invention can be used to synthesize a library havinga diverse number of chemical entities that are encoded byoligonucleotide tags. Examples of building blocks and encoding DNA tagsare found in U.S. Patent Application Publication No. 2007/0224607,hereby incorporated by reference.

Each chemical entity is formed from one or more building blocks andoptionally a scaffold. The scaffold serves to provide one or morediversity nodes in a particular geometry (e.g., a triazine to providethree nodes spatially arranged around a heteroaryl ring or a lineargeometry).

The building blocks and their encoding tags can be added directly orindirectly (e.g., via a linker) to the headpiece to form a complex. Whenthe headpiece includes a linker, the building block or scaffold is addedto the end of the linker. When the linker is absent, the building blockcan be added directly to the headpiece or the building block itself caninclude a linker that reacts with a functional group of the headpiece.Exemplary linkers and headpieces are described herein.

The scaffold can be added in any useful way. For example, the scaffoldcan be added to the end of the linker or the headpiece, and successivebuilding blocks can be added to the available diversity nodes of thescaffold. In another example, building block A_(n) is first added to thelinker or the headpiece, and then the diversity node of scaffold S isreacted with a functional group in building block A_(n). Oligonucleotidetags encoding a particular scaffold can optionally be added to theheadpiece or the complex. For example, S_(n) is added to the complex inn reaction vessels, where n is an integer more than one, and tag S_(n)(i.e., tag S₁, S₂, . . . S_(n−1), S_(n)) is bound to the functionalgroup of the complex.

Building blocks can be added in multiple, synthetic steps. For example,an aliquot of the headpiece, optionally having an attached linker, isseparated into n reaction vessels, where n is an integer of two orgreater. In the first step, building block A_(n) is added to each nreaction vessel (i.e., building block A₁, A₂, . . . A_(n−1), A_(n) isadded to reaction vessel 1, 2, . . . n−1, n), where n is an integer andeach building block A_(n) is unique. In the second step, scaffold S isadded to each reaction vessel to form an A_(n)-S complex. Optionally,scaffold S_(n) can be added to each reaction vessel to from anA_(n)-S_(n) complex, where n is an integer of more than two, and eachscaffold S_(n) can be unique. In the third step, building block B_(n) isto each n reaction vessel containing the A_(n)-S complex (i.e., buildingblock B₁, B₂, . . . B_(n−1), B_(n), is added to reaction vessel 1, 2, .. . n−1, n containing the A₁-S, A₂-S, . . . A_(n−1)-S, A_(n)-S complex),where each building block B_(n) is unique. In further steps, buildingblock C_(n) can be added to each n reaction vessel containing theB_(n)-A_(n)-S complex (i.e., building block C₁, C₂, . . . C_(n−1), C_(n)is added to reaction vessel 1, 2, . . . n−1, n containing the B₁-A₁-S .. . B_(n)-A_(n)-S complex), where each building block C_(n) is unique.The resulting library will have n³ number of complexes having n³ tags.In this manner, additional synthetic steps can be used to bindadditional building blocks to further diversify the library.

After forming the library, the resultant complexes can optionally bepurified and subjected to a polymerization or ligation reaction usingone or more primers. This general strategy can be expanded to includeadditional diversity nodes and building blocks (e.g., D, E, F, etc.).For example, the first diversity node is reacted with building blocksand/or S and encoded by an oligonucleotide tag. Then, additionalbuilding blocks are reacted with the resultant complex, and thesubsequent diversity node is derivatized by additional building blocks,which is encoded by the primer used for the polymerization or ligationreaction

To form an encoded library, oligonucleotide tags are added to thecomplex after or before each synthetic step. For example, before orafter the addition of building block A_(n) to each reaction vessel, tagA_(n) is bound to the functional group of the headpiece (i.e., tag A₁,A₂, . . . A_(n−1), A_(n) is added to reaction vessel 1, 2, . . . n−1, ncontaining the headpiece). Each tag A_(n) has a distinct sequence thatcorrelates with each unique building block A_(n), and determining thesequence of tag A_(n) provides the chemical structure of building blockA_(n). In this manner, additional tags are used to encode for additionalbuilding blocks or additional scaffolds.

Furthermore, the last tag added to the complex can either include aprimer sequence or provide a functional group to allow for binding(e.g., by ligation) of a primer sequence. The primer sequence can beused for amplifying and/or sequencing the oligonucleotides tags of thecomplex. Exemplary methods for amplifying and for sequencing includepolymerase chain reaction (PCR), linear chain amplification (LCR),rolling circle amplification (RCA), or any other method known in the artto amplify or determine nucleic acid sequences.

Using these methods, large libraries can be formed having a large numberof encoded chemical entities. For example, a headpiece is reacted with alinker and building block A_(n), which includes 1,000 different variants(i.e., n=1,000). For each building block A_(n), a DNA tag A_(n) isligated or primer extended to the headpiece. These reactions may beperformed in a 1,000-well plate or 10×100 well plates. All reactions maybe pooled, optionally purified, and split into a second set of plates.Next, the same procedure may be performed with building block B_(n),which also include 1,000 different variants. A DNA tag B_(n) may beligated to the A_(n)-headpiece complex, and all reactions may be pooled.The resultant library includes 1,000×1,000 combinations of A_(n)×B_(n)(i.e., 1,000,000 compounds) tagged by 1,000,000 different combinationsof tags. The same approach may be extended to add building blocks C_(n),D_(n), E_(n), etc. The generated library may then be used to identifycompounds that bind to the target. The structure of the chemicalentities that bind to the library can optionally be assessed by PCR andsequencing of the DNA tags to identify the compounds that were enriched.

This method can be modified to avoid tagging after the addition of eachbuilding block or to avoid pooling (or mixing). For example, the methodcan be modified by adding building block A_(n) to n reaction vessels,where n is an integer of more than one, and adding the identicalbuilding block B₁ to each reaction well. Here, B₁ is identical for eachchemical entity, and, therefore, an oligonucleotide tag encoding thisbuilding block is not needed. After adding a building block, thecomplexes may be pooled or not pooled. For example, the library is notpooled following the final step of building block addition, and thepools are screened individually to identify compound(s) that bind to atarget. To avoid pooling all of the reactions after synthesis, a BIND®Reader (from SRU Biosystems, Inc.), for example, may be used to monitorbinding on a sensor surface in high throughput format (e.g., 384 wellplates and 1,536 well plates). For example, building block A_(n) may beencoded with DNA tag A_(n), and building block B_(n) may be encoded byits position within the well plate. Candidate compounds can then beidentified by using a binding assay (e.g., using a BIND® Biosensor, alsoavailable by SRU Biosystems, Inc., or using an ELISA assay) and byanalyzing the A_(n) tags by sequencing, microarray analysis and/orrestriction digest analysis. This analysis allows for the identificationof combinations of building blocks A_(n) and B_(n) that produce thedesired molecules.

The method of amplifying can optionally include forming a water-in-oilemulsion to create a plurality of aqueous microreactors. The reactionconditions (e.g., concentration of complex and size of microreactors)can be adjusted to provide, on average, a microreactor having at leastone member of a library of compounds. Each microreactor can also containthe target, a single bead capable of binding to a complex or a portionof the complex (e.g., one or more tags) and/or binding the target, andan amplification reaction solution having one or more necessary reagentsto perform nucleic acid amplification. After amplifying the tag in themicroreactors, the amplified copies of the tag will bind to the beads inthe microreactors, and the coated beads can be identified by any usefulmethod.

Once the building blocks from the first library that bind to the targetof interest have been identified, a second library may be prepared in aniterative fashion. For example, one or two additional nodes of diversitycan be added, and the second library is created and sampled, asdescribed herein. This process can be repeated as many times asnecessary to create molecules with desired molecular and pharmaceuticalproperties.

Various ligation techniques can be used to add the scaffold, buildingblocks, linkers, and building block tags. Accordingly, any of thebinding steps described herein can include any useful ligation techniqueor techniques. Exemplary ligation techniques include enzymatic ligation,such as use of one of more RNA ligases and/or DNA ligases, as describedherein; and chemical ligation, such as use of chemically co-reactivepairs, as described herein.

Scaffold and Building Blocks

The scaffold S can be a single atom or a molecular scaffold. Exemplarysingle atom scaffolds include a carbon atom, a boron atom, a nitrogenatom, or a phosphorus atom, etc. Exemplary polyatomic scaffolds includea cycloalkyl group, a cycloalkenyl group, a heterocycloalkyl group, aheterocycloalkenyl group, an aryl group, or a heteroaryl group.Particular embodiments of a heteroaryl scaffold include a triazine, suchas 1,3,5-triazine, 1,2,3-triazine, or 1,2,4-triazine; a pyrimidine; apyrazine; a pyridazine; a furan; a pyrrole; a pyrrolline; a pyrrolidine;an oxazole; a pyrazole; an isoxazole; a pyran; a pyridine; an indole; anindazole; or a purine.

The scaffold S can be operatively linked to the tag by any usefulmethod. In one example, S is a triazine that is linked directly to theheadpiece. To obtain this exemplary scaffold, trichlorotriazine (i.e., achlorinated precursor of triazine having three chlorines) is reactedwith a nucleophilic group of the headpiece. Using this method, S hasthree positions having chlorine that are available for substitution,where two positions are available diversity nodes and one position isattached to the headpiece. Next, building block A_(n) is added to adiversity node of the scaffold, and tag A_(n) encoding for buildingblock A_(n) (“tag A_(n)”) is ligated to the headpiece, where these twosteps can be performed in any order. Then, building block B_(n) is addedto the remaining diversity node, and tag B_(n) encoding for buildingblock B_(n) is ligated to the end of tag A_(n). In another example, S isa triazine that is operatively linked to the linker of a tag, wheretrichlorotriazine is reacted with a nucleophilic group (e.g., an aminogroup) of a PEG, aliphatic, or aromatic linker of a tag. Building blocksand associated tags can be added, as described above.

In yet another example, S is a triazine that is operatively linked tobuilding block A_(n). To obtain this scaffold, building block A_(n)having two diversity nodes (e.g., an electrophilic group and anucleophilic group, such as an Fmoc-amino acid) is reacted with thenucleophilic group of a linker (e.g., the terminal group of a PEG,aliphatic, or aromatic linker, which is attached to a headpiece). Then,trichlorotriazine is reacted with a nucleophilic group of building blockA_(n). Using this method, all three chlorine positions of S are used asdiversity nodes for building blocks. As described herein, additionalbuilding blocks and tags can be added, and additional scaffolds S_(n),can be added.

Exemplary building block A_(n)'s include, e.g., amino acids (e.g.,alpha-, beta-, gamma-, delta-, and epsilon-amino acids, as well asderivatives of natural and unnatural amino acids), chemicallyco-reactive reactants (e.g., azide or alkyne chains) with an amine, or athiol reactant, or combinations thereof. The choice of building blockA_(n) depends on, for example, the nature of the reactive group used inthe linker, the nature of a scaffold moiety, and the solvent used forthe chemical synthesis.

Exemplary building block B_(n)'s and C_(n)'s include any usefulstructural unit of a chemical entity, such as optionally substitutedaromatic groups (e.g., optionally substituted phenyl or benzyl),optionally substituted heterocyclyl groups (e.g., optionally substitutedquinolinyl, isoquinolinyl, indolyl, isoindolyl, azaindolyl,benzimidazolyl, azabenzimidazolyl, benzisoxazolyl, pyridinyl, piperidyl,or pyrrolidinyl), optionally substituted alkyl groups (e.g., optionallysubstituted linear or branched C₁₋₆ alkyl groups or optionallysubstituted C₁₋₆ aminoalkyl groups), or optionally substitutedcarbocyclyl groups (e.g., optionally substituted cyclopropyl,cyclohexyl, or cyclohexenyl). Particularly useful building block B_(n)'sand C_(n)'s include those with one or more reactive groups, such as anoptionally substituted group (e.g., any described herein) having one oroptional substituents that are reactive groups or can be chemicallymodified to form reactive groups. Exemplary reactive groups include oneor more of amine (—NR₂, where each R is, independently, H or anoptionally substituted C₁₋₆ alkyl), hydroxy, alkoxy (—OR, where R is anoptionally substituted C₁₋₆ alkyl, such as methoxy), carboxy (—COOH),amide, or chemically co-reactive substituents. A restriction site may beintroduced, for example, in tag B_(n) or C_(n), where a complex can beidentified by performing PCR and restriction digest with one of thecorresponding restriction enzymes.

Linkers

The bifunctional linker between the headpiece and the chemical entitycan be varied to provide an appropriate spacer and/or to increase thesolubility of the headpiece in organic solvent. A wide variety oflinkers are commercially available that can couple the headpiece withthe small molecule library. The linker typically consists of linear orbranched chains and may include a C₁₋₁₀ alkyl, a heteroalkyl of 1 to 10atoms, a C₂₋₁₀ alkenyl, a C₂₋₁₀ alkynyl, C₅₋₁₀ aryl, a cyclic orpolycyclic system of 3 to 20 atoms, a phosphodiester, a peptide, anoligosaccharide, an oligonucleotide, an oligomer, a polymer, or a polyalkyl glycol (e.g., a poly ethylene glycol, such as—(CH₂CH₂O)_(n)CH₂CH₂—, where n is an integer from 1 to 50), orcombinations thereof.

The bifunctional linker may provide an appropriate spacer between theheadpiece and a chemical entity of the library. In certain embodiments,the bifunctional linker includes three parts. Part 1 may be a reactivegroup, which forms a covalent bond with DNA, such as, e.g., a carboxylicacid, preferably activated by a N-hydroxy succinimide (NHS) ester toreact with an amino group on the DNA (e.g., amino-modified dT), anamidite to modify the 5′ or 3′-terminus of a single-stranded headpiece(achieved by means of standard oligonucleotide chemistry), chemicallyco-reactive pairs (e.g., azido-alkyne cycloaddition in the presence ofCu(I) catalyst, or any described herein), or thiol reactive groups. Part2 may also be a reactive group, which forms a covalent bond with thechemical entity, either building block A_(n) or a scaffold. Such areactive group could be, e.g., an amine, a thiol, an azide, or analkyne. Part 3 may be a chemically inert spacer of variable length,introduced between Part 1 and 2. Such a spacer can be a chain ofethylene glycol units (e.g., PEGs of different lengths), an alkane, analkene, a polyene chain, or a peptide chain. The linker can containbranches or inserts with hydrophobic moieties (such as, e.g., benzenerings) to improve solubility of the headpiece in organic solvents, aswell as fluorescent moieties (e.g. fluorescein or Cy-3) used for librarydetection purposes. Hydrophobic residues in the headpiece design may bevaried with the linker design to facilitate library synthesis in organicsolvents. For example, the headpiece and linker combination is designedto have appropriate residues wherein the octanol:water coefficient(P_(oct)) is from, e.g., 1.0 to 2.5.

Linkers can be empirically selected for a given small molecule librarydesign, such that the library can be synthesized in organic solvent, forexample, in 15%, 25%, 30%, 50%, 75%, 90%, 95%, 98%, 99%, or 100% organicsolvent. The linker can be varied using model reactions prior to librarysynthesis to select the appropriate chain length that solubilizes theheadpiece in an organic solvent. Exemplary linkers include those havingincreased alkyl chain length, increased poly ethylene glycol units,branched species with positive charges (to neutralize the negativephosphate charges on the headpiece), or increased amounts ofhydrophobicity (for example, addition of benzene ring structures).

Examples of commercially available linkers include amino-carboxyliclinkers, such as those being peptides (e.g., Z-Gly-Gly-Gly-Osu(N-alpha-benzyloxycarbonyl-(Glycine)₃-N-succinimidyl ester) orZ-Gly-Gly-Gly-Gly-Gly-Gly-Osu(N-alpha-benzyloxycarbonyl-(Glycine)₆-N-succinimidyl ester, SEQ ID NO:3)), PEG (e.g., Fmoc-aminoPEG2000-NHS or amino-PEG (12-24)-NHS), oralkane acid chains (e.g., Boc-ε-aminocaproic acid-Osu); chemicallyco-reactive pair linkers, such as those chemically co-reactive pairsdescribed herein in combination with a peptide moiety (e.g.,azidohomoalanine-Gly-Gly-Gly-OSu (SEQ ID NO: 4) orpropargylglycine-Gly-Gly-Gly-OSu (SEQ 1D NO: 5)), PEG (e.g.,azido-PEG-NHS), or an alkane acid chain moiety (e.g., 5-azidopentanoicacid, (5)-2-(azidomethyl)-1-Boc-pyrrolidine, 4-azidoaniline, or4-azido-butan-1-oic acid N-hydroxysuccinimide ester); thiol-reactivelinkers, such as those being PEG (e.g., SM(PEG)n NHS-PEG-malcimide),alkane chains (e.g., 3-(pyridin-2-yldisulfanyl)-propionic acid-Osu orsulfosuccinimidyl 6-(3′-[2-pyridyldithio]-propionamido)hexanoate)); andamidites for oligonucleotide synthesis, such as amino modifiers (e.g.,6-(trifluoroacetylamino)-hexyl-(2-cyanoethyl)-(N,N-diisopropyl)-phosphoramidite),thiol modifiers (e.g.,5-trityl-6-mercaptohexyl-1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite,or chemically co-reactive pair modifiers (e.g.,6-hexyn-1-yl-(2-cyanoethyl)-(N,N-diisopropyl)-phosphoramidite,3-dimethoxytrityloxy-2-(3-(3-propargyloxypropanamido)propanamido)propyl-1-O-succinoyl,long chain alkylamino CPG, or 4-azido-butan-1-oic acidN-hydroxysuccinimide ester)). Additional linkers are known in the art,and those that can be used during library synthesis include, but are notlimited to,5′-O-dimethoxytrityl-1′,2′-dideoxyribose-3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;9-O-dimethoxytrityl-triethyleneglycol,1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;3-(4,4′-dimethoxytrityloxy)propyl-1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite;and 18-O-dimethoxytritylhexaethyleneglycol,1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite.Any of the linkers herein can be added in tandem to one another indifferent combinations to generate linkers of different desired lengths.

Linkers may also be branched, where branched linkers are well known inthe art and examples can consist of symmetric or asymmetric doublers ora symmetric trebler. See, for example, Newcome et al., DendriticMolecules: Concepts, Synthesis, Perspectives, VCH Publishers (1996);Boussif et al., Proc. Natl. Acad. Sci. USA 92:7297-7301 (1995); andJansen et al., Science 266:1226 (1994).

Example 1 General Strategy to Improve Single-Stranded Ligation of DNATags

Various reaction conditions were explored to improve single-strandedligation of tags to form an encoded library. These reaction conditionsincluded using modified nucleotides within the tag (e.g., use of one ormore nucleotides having a 2′-OMe group to form a MNA/DNA tag, where“MNA” refers to an oligonucleotide having at least one 2′-O-methylnucleotide); using donor tags and acceptor tags having different lengthsand varying the concentration of the tags; using different types ofligases, as well as combinations thereof (e.g., CircLigase™ ssDNA ligaseand/or T4 RNA ligase), and varying their concentration; purifying thecomplex by removing unreacted starting materials; using poly ethyleneglycols (PEGs) having different molecular weights and varying theirconcentration; varying the temperature and duration for reaction, suchas ligation; varying the concentration of various agents, including ATP,Co(NH₃)₆Cl₃, and yeast inorganic pyrophosphate; using enzymatically orchemically phosphorylated oligonucleotide tags; using 3′-protected tags;and using 5′-chemically adenylated tags.

After a thorough analysis of different conditions, optimal combinationsof parameters that provided up to 90% ligation efficiency (e.g., FIG.5C), as determined by the fraction of ligated final product toun-ligated starting reactant (“fraction ligated”), were found. A schemeof the ligation reaction using ligase is shown in FIG. 5A, and a typicaldenaturing polyacrylamide gel electrophoresis is shown in FIG. 5B. Thedonor oligonucleotide was labeled at the 3′-terminus and could bedetected on a gel by scanning at 450 nm excitation on a Storm™ 800PhosphorImager. The gel depicts an unligated donor (or startingmaterial) and the ligated product. In particular, the adenylated donorcan be resolved and distinguished from the starting material on thisgel.

Table 2 provides ligation efficiencies measured as a function of thecomposition of the oligonucleotide (i.e., oligonucleotides with all DNAnucleotides versus oligonucleotides with at least one 2′-O-methylnucleotide, labeled “MNA”) and the type of ligase (i.e., RNA ligaseversus ssDNA ligase). These ligation experiments included the followingtags: an all-DNA donor having the sequence of 5′-P-GCT GTG CAG GTA GAGTGC-6-FAM-3′ (SEQ ID NO: 6); a 5′-MNA-DNA donor having the sequence of5′-P-mGCT GTG CAG GTA GAG TGC-6-FAM-3′ (SEQ ID NO: 7); an all-MNA donorhaving the sequence of 5′-P-mGmUmG mCmAmG mGmUmA mGmAmG mUmGmC-6-FAM-3′(SEQ ID NO: 8); a DNA-3′MNA acceptor having the sequence of 5′-HO-TACGTA TAC GAC TGmG-OH-3′ (SEQ ID NO: 9); an all-DNA acceptor having thesequence of 5′-HO-GCA GAC TAC GTA TAC GAC TGG-OH-3′ (SEQ ID NO: 10); andan all-MNA acceptor having the sequence of 5′-HO-mUmAmC mGmUmA mUmAmCmGmAmC mUmGmG-OH-3′ (SEQ ID NO: 11), where “m” indicates a 2′-OMe base,“P” indicates a phosphorylated nucleotide, and “FAM” indicatesfluorescein.

Ligation efficiencies were calculated from gel densitometry data as theratio between the intensity from the ligation product and the sum of theintensity from the ligation product and the unligated starting material.The reaction conditions for T4 RNA ligase included the following: 5 μMeach of donor and acceptor oligonucleotides (15-18 nucleotides (nts)long) in a buffer solution containing 50 mM Tris HCl, 10 mM MgCl₂, 1 mMhexamine cobalt chloride, 1 mM ATP, 25% PEG4600, and 5 units of T4 RNAligase (NEB-new units) at pH 7.5. The reactions were incubated at 37° C.for 16 hours. The reaction conditions for CircLigase™ included thefollowing: 5 μM each of donor and acceptor oligonucleotides (length 15or 18 nts) incubated in a buffer containing 50 mM MOPS (pH 7.5), 10 mMKCl, 5 mM MgCl₂, 1 mM DTT, 0.05 mM ATP, 2.5 mM MnCl₂, and 25% (w/v) PEG8000 with 20 units of CircLigase™ (Epicentre) at 50° C. for 16 hours.The reactions were resolved on 8M urea/15% PAAG, followed bydensitometry using excitation at 450 nm.

TABLE 2 Donor Acceptor T4 RNA ligase CircLigase ™ All-DNA All-DNA  9%89% All-DNA All-MNA 14% 68% All-DNA DNA-3′MNA 46% 85% All-MNA All-DNA11% 84% All-MNA All-MNA 20% 29% All-MNA DNA-3′MNA 32% 73% 5′-MNA-DNAAll-DNA 29% 90% 5′-MNA-DNA All-MNA 16% 46% 5′-MNA-DNA DNA-3′MNA 69% 81%

Generally, CircLigase™ produced higher ligation yields than T4 RNAligase (Table 2). When both donor and acceptor were DNA/MNA hybridoligonucleotides, efficient ligation was achieved with T4 RNA ligase.

FIG. 5C shows high yield ligation achieved for T4 RNA ligase at highenzyme and oligonucleotide concentrations. The reaction conditionsincluded following: 250 μM each of donor and acceptor oligonucicotidesin a buffer containing 50 mM Tris HCl, 10 mM MgCl₂, 1 mM hexamine cobaltchloride, 2.5 mM ATP, 30% (w/v) PEG4600, pH 7.5, different amounts of T4RNA ligase at 40 units/(NEB-new units), and 0.1 unit of yeast inorganicpyrophosphatase. The reactions were incubated at 37° C. for 5 and 20hours and resolved on 8M urea/15% PAAG, followed by densitometry usingexcitation at 450 nm.

Overall, these data suggest that enzymatic ligation can be optimized byincluding one or more modified 2′-nucleotides and/or by using an RNA orDNA ligase. Further details for several other tested conditions, such asPEG or tag length, that can contribute to ligation efficiency arediscussed below.

Example 2 Effect of PEG on Single-Stranded Ligation

To determine the effect of PEG molecular weight (MW) on ligation,single-stranded tags were ligated with 25% (w/v) of PEG having a MW from300 to 20,000 Daltons. As shown in FIG. 6A, 80% or greater ligation wasobserved for PEG having a MW of 3,350, 4,000, 6,000, 8,000, and 20,000.These ligation experiments included the following tags: a 15mer donorhaving the sequence of 5′-P-mGTG CAG GTA GAG TGC-6-FAM-3′ (SEQ ID NO:12) and a 15mer acceptor having the sequence of 5′-HO-mUAC GTA TAC GACTGmG-OH-3′ (SEQ ID NO: 13). These oligonucleotide tags were DNAsequences with one or two terminal 2′O-methyl (2′-OMe) RNA bases (e.g.,2′-OMe-U (mU) or 2′-OMe-G (mG)).

Experiments were also conducted to determine the effect of PEGconcentration. Single-stranded tags were ligated with variousconcentration of PEG having a MW of 4,600 Daltons (PEG4600). As shown inFIG. 6B, 70% or greater ligation, on average, was observed for 25% (w/v)to 35% (w/v) PEG4600.

Example 3 Effect of Tag Length on Single-Stranded Ligation

To determine the effect of tag length on ligation, acceptor and donortags of various lengths were constructed. For CircLigase™ experiments, a15mer donor having the sequence 5′-P-mGTG CAG GTA GAG TGC-6-FAM-3′ (SEQID NO: 12) was used and paired with 10, 12, 14, 16, and 18mer DNAacceptor oligonucleotides. For T4 RNA ligase experiments, the tagsincluded one or more 2′-OMe-bases (designated as being MNA/DNA tags).Table 3 provides the sequence for the three donor tags (15mer, 8mer, and5mer) and the three acceptor tags (15mer, 8mer, and 5mer).

TABLE 3 Oligonucleotide tag Sequence* 15mer donor5′-P-mGTG CAG GTA GAG TGC-6-FAM-3′ (SEQ ID NO: 12) 15mer acceptor5′-HO-mUAC GTA TAC GAC TGmG-OH-3′ (SEQ ID NO: 13)  8mer donor5′-P-mGT GAG TGC-6-FAM-3′ (SEQ ID NO: 14)  8mer acceptor5′-HO-C A GAC TGmG-OH-3′ (SEQ ID NO: 15)  5mer donor5′-P-mGT GAC-6-FAM-3′ (SEQ ID NO: 16)  5mer acceptor5′-HO-mAC TGmG-OH-3′ (SEQ ID NO: 17) *“m” indicates a 2′-OMe base,“P” indicates a phosphorylated nucleotide, and “FAM” indicatesfluorescein.

The extent of ligation was analyzed by densitometry of electrophoreticgels (FIGS. 7A-7B). The results of the CircLigase™ reactions indicate astrong dependence of ligation yield on the length of the acceptoroligonucleotide (FIG. 7A). The highest ligation yield was observed withan 18mer acceptor (62%), while ligation yield with a 10mer acceptor waslower than 10%. The results of the T4 RNA ligase reactions indicate thatthe combination of an 8mer acceptor with an 8mer donor provided thehighest yield and that combinations having a 15mer donor with any of thetested acceptors provided yields greater than 75% (FIG. 7B). If alibrary includes shorter tags (i.e., about 10mer or shorter), then T4RNA ligase may be preferred for tag ligation. In other cases, ligationcan be further optimized by using CircLigase™ or a combination of T4 RNAligase and CircLigase™.

Example 4 Effect of Purification on Single-Stranded Ligation

To determine the effect of purification on ligation, single-strandedtags were ligated to imitate the library synthetic process. For theseexperiments, the tags included 15mer donor and 15mer acceptor tags, asprovided above in Table 3. The chemical entity was bound to the3′-terminus of the library, where the chemical entity was fluorescein inthis example to aid in visualization. As shown in FIG. 9 (right),successive tags were ligated to the 5′-OH group of the complex afterphosphorylation by T4 PNK.

Experiments were also conducted by purifying the ligated product (i.e.,the complex) prior to the PNK reaction, where particular agents usefulin the ligation reaction (e.g., phosphate, cobalt, and/or unreactedtags) can inhibit the phosphorylation reaction with PNK or reduceligation yield. As shown in FIG. 9 (left), purifying the complex (i.e.,minimal precipitation) prior to the PNK reaction increased ligation (seedata marked with *, indicating purification). FIGS. 8A-8B show LC-MSspectra for a 15mer MNA/DNA tag before and after phosphorylation. Thepresence or absence of DTT had no effect on phosphorylation.

Example 5 Chemically Co-Reactive Pair Ligation and Reverse Transcriptionof Junctions

The methods described herein can further include chemically co-reactivepair ligation techniques, as well as enzyme ligation techniques.Accordingly, as an example of chemical ligation, an exemplary chemicallyco-reactive pair (i.e., an alkyne and an azido pair in a cycloadditionreaction) in two variants: a short chemically co-reactive pair and along chemically co-reactive pair, was used.

Materials

In a first variant, a short chemically co-reactive pair (FIG. 10A) wasused. The pair included (i) an oligonucleotide having the sequence5′-GCG TGA ACA TGC ATC TCC CGT ATG CGT ACA GTC CAT T/propargylG/-3′(“5end3propargyl,” SEQ ID NO: 18) and (ii) an oligonucleotide having thesequence 5′-/azidoT/ATA GCG CGA TAT ACA CAC TGG CGA GCT TGC GTA CTG-3′(“3end5azido,” SEQ ID NO: 19). This pair of oligonucleotides wasprepared by TriLink BioTechnologies, Inc. (San Diego, Calif.). Theseoligonucleotides were designed to produce a short spacer between twooligonucleotides upon ligation, where the linker would be 5 atoms long(counting from the C3′-position of the 5end3propargyl oligonucleotide tothe C5′-position of the 3end5azido oligonucleotide). In addition, the5′-azido oligonucleotide (3end5azido) was prepared by converting theiodo group in the corresponding 5′-iodo oligonucleotide into an azidogroup.

In a second variant, a long chemically co-reactive pair (FIG. 10B) wasused. The pair included (i) an oligonucleotide having the sequence5′-GCG TGA ACA TGC ATC TCC CGT ATG CGT ACA GTC CAT TG/spacer7-azide/-3′(“5end3azide,” SEQ ID NO: 20) and (ii) an oligonucleotide having thesequence 5′-/hexynyl/TA GCG CGA TAT ACA CAC TGG CGA GCT TGC GTA CTG-3′(“3end5hexynyl,” SEQ ID NO: 21). This pair of oligonucleotides wasprepared by Integrated DNA Technologies, Inc. (IDT DNA, San Diego,Calif., and Coralville, Iowa). The 5end3azide oligonucleotide wasprepared by reacting an azidobutyrate N-hydroxysuccinimide ester with a3′-amino-modifier C7 (2-dimethoxytrityloxymethyl-6-fluorenylmethoxycarbonylamino-hexane-1-succinoyl-long chainalkylamino), which was introduced during oligonucleotide columnsynthesis. This pair was designed to produce a 24 atom long spacerbetween the oligonucleotides (counting from the C3′-position of the5end3azide oligonucleotide to the C5′-position of the 3end5hexynyloligonucleotide).

For reverse transcription (as shown by the schematic in FIG. 11A), theprimers and templates included the following: a reverse transcriptionprimer having the sequence of 5′-/Cy5/CAG TAC GCA AGC TCG-3′(“Cy5s_primer15,” SEQ ID NO: 22); a control template having the sequenceof 5′-GCG TGA ACA TGC ATC TCC CGT ATG CGT ACA GTC CAT TGT ATA GCG CGATAT ACA CAC TGG CGA GCT TGC GTA CTG-3′ (“templ75,” SEQ ID NO: 23); a5′-PCR primer having the sequence of 5′-GCG TGA ACA TGC ATC TCC-3′ (SEQID NO: 24); and a 3′-PCR primer having the sequence of 5′-CAG TAC GCAAGC TCG CC-3′ (SEQ 1D NO: 25), where these sequences were obtained fromIDT DNA. A Cy5-labeled DNA primer was used for the experiments to enableseparate detection of the reverse transcription products by LC.

Experimental Conditions

For the chemically co-reactive pair ligations, 1 mM solutions ofchemically co-reactive pairs, such as 5end3propargyl+3end5azido (short)or 5end3azide+3end5hexynyl (long), were incubated for 12 hours in thepresence of 100 equivalents of TBTA ligand(tris-[(1-benzyl-1H-1,2,3-triazol-4-yl)methyl]amine) and 50 equivalentsof CuBr in a water/dimethyl acetate mixture. Following the reaction, anexcess of EDTA was added, and the reaction mixtures were desalted usingZeba Spin Desalting Columns (Invitrogen Corp., Carlsbad, Calif.) andthen ethanol precipitated. For the reverse transcription reactions, thetemplates were purified on a 15% polyacrylamide gel containing 8M urea.

Liquid chromatography-mass spectrometry (LC-MS) was performed on aThermo Scientific LCQ Fleet using an ACE 3 C18-300 (50×2.1 mm) columnand a 5 minute gradient of 5-35% of buffer B using buffer A (1%hexafluoroisopropanol (HFIP), 0.1% di-isopropylethyl amine (DIEA), 10 μMEDTA in water) and buffer B (0.075% HFIP, 0.0375% DIEA, 10 μm EDTA, 65%acetonitrile/35% water). LC was monitored at 260 nm and 650 nm. MS wasdetected in the negative mode, and mass peak deconvolution was performedusing ProMass software.

Reverse transcription reactions were performed using ThermoScript™ RT(Invitrogen Corp.), according to the manufacturer's protocol, at 50° C.for 1-2 hours. The results were analyzed by LC-MS and by PCR. PCR wasperformed using Platinum® SuperMix and resolved on 4% agarose E-Gels(both from Invitrogen Corp.). Eleven and eighteen cycles of PCR wereperformed with or without a preceding RT reaction. The 75mer templatewas not reverse transcribed and used directly for the PCR amplification.

Results and Discussion

In both the ligations forming a short spacer and a long spacer, reactionyields were high, close to quantitative, as analyzed by LC-MS.Accordingly, chemical ligation provides a high yield technique to bindor operatively associate a headpiece to one or more building block tags.

For a viable chemical ligation strategy to produce DNA-encodedlibraries, the resultant complex should be capable of undergoing PCR orRT-PCR for further sequencing applications. While PCR and RT-PCR may notbe an issue with enzymatically ligated tags, such as described above,unnatural chemical linkers may be difficult to process by RNA or DNApolymerases. The data provided in FIGS. 11B-11E suggest thatoligonucleotides having a spacer of particular lengths can betranscribed and/or reverse transcribed.

In the case of a chemically co-reactive pair linker resulting in atriazole-linked oligonucleotide, a dependence on the length of thelinker was observed. For the short chemically co-reactive pair, theresultant template was reverse transcribed and analyzed by LC-MS. LCanalysis revealed three major absorption peaks at 2.79 min., 3.47 min.,and 3.62 min. for 260 nm, where the peaks at 3.47 min. and 3.62 min.also provided absorption peaks at 650 nm. MS analysis of the peak at3.47 min. showed only the presence of the template 23097.3 (calc'd23098.8), and the peak at 3.62 min. contained a template (23098.0) and afully extended primer (23670.8, calc'd: 23671.6) at an approximately1.7:1 ratio, suggesting a 50-60% yield for this RT reaction (FIG. 11C).For comparison, reverse transcription (RT) of the control having anall-DNA template produced the extended primer (peak 23068.9) in anamount roughly equivalent to the template (23078.7), suggesting close toa 100% yield (FIG. 11B).

For the long chemically co-reactive pair, LC of the RT reaction showedtwo absorption peaks at 2.77 min and 3.43 min for 260 nm, where the peakat 3.43 min also provided absorption peaks at 650 nm, i.e., contained aCy5 labeled material, which is the expected RT product. MS analysis ofthe peak at 3.43 min. revealed the template (observed 23526.6, calc'd:23534.1), as well as the Cy5 primer extended to the linker (11569.1). Nofull length product was observed by LC-MS, indicating that the RTreaction did not occur in a measureable amount (FIG. 11D).

RT-PCR was performed with the templates described above and revealedthat only the short linker yielded reverse transcription product, albeitat 5-10 lower efficiency (FIG. 11E). Efficiency of the RT was estimatedto be about 2-fold lower than the template (templ75). For example, thePCR product of the short ligated template around 2-fold lower after RTand around 5-10 times lower without RT, as compared to the PCR productof the all-DNA template 75 (templ75). Accordingly, these data providesupport for the use of chemical ligation to produce a complex that canbe reverse transcribed and/or transcribed, and chemically ligatedheadpieces and/or tags can be used in any of the binding steps describedherein to produce encoded libraries.

Example 6 Ligation of 3′-Phosphorothioate Oligonucleotides with 5′-IodoOligonucleotides

To determine the flexibility of the methods described herein, theligation efficiency of oligonucleotides having other modifications weredetermined. In particular, analogs of the natural phosphodiester linkage(e.g., a phosphorothioate analog) could provide an alternative moietyfor post-selection PCR analysis and sequencing.

The following oligonucleotides were synthesized by TriLinkBioTechnologies, Inc. (San Diego, Calif.): (i) 5′-/Cy5/CGA TAT ACA CACTGG CGA GCT/thiophosphate/-3′ (“CCy5,” SEQ ID NO: 26), (ii)5′-/IododT/GC GTA CTG AGC/6-FAM/-3′ (“CFL,” SEQ ID NO: 27), as shown inFIG. 12A, and (iii) a splint oligonucleotide having the sequence of CAGTAC GCA AGC TCG CC (“spl,” SEQ ID NO: 28). Ligation reactions wereperformed with 100 μM of each reactant oligonucleotide in a buffercontaining 50 mM Tris HCl (pH 7.0), 100 mM NaCl, and 10 mM MgCl₂(“ligation buffer”) at room temperature. The ligation reactions weresupplemented by either of the following: 100 μM of the splintoligonucleotide, 10 mM Co(NH₃)₆Cl₃, 40% (w/v) of PEG4000, or 80% (w/v)of PEG300. The reaction was allowed to progress for up to 48 hours.Ligation products were analyzed by LC-MS using detection at 260 nm, 495nm, and 650 nm, as well as by an 8M urea/15% polyacrylamide gel (PAAG)that was further scanned at 450 and 635 nm excitation on a Storm™ 800PhosphorImager.

In the absence of the splint oligonucleotide, no ligation was observed(FIG. 12B, lanes labeled “-spl”). In the presence of the splintoligonucleotide, ligation occurred and reached around 60% of fractionligated after 48 hours (FIGS. 12B-12C). LC-MS revealed several peaks inthe chromatogram, with a peak at 3.00 min absorbing at 260 nm, 495 nm,and 650 nm. MS of this peak showed mostly the product of ligation at11539.6 Da (calc'd 11540) with less than 10% of CCy5 oligonucleotide at7329.8 Da (calc'd 7329.1). Low levels of ligation were detected in thepresence of PEGs and hexamine cobalt, where hexamine cobalt causedprecipitation of the Cy5-labeled oligonucleotide. These data suggestthat headpieces and/or tags having modified phosphate groups (e.g.,modified phosphodiester linkages, such as phosphorothioate linkages) canbe used in any of the binding steps described herein to produce encodedlibraries.

In order to further study the iodo-phosphorothioate ligation reaction,the ligation of 5′-I dT-oligo-3′-FAM (CFL) and 5′-Cy5-oligo-3′-PS (CCy5)was performed in the absence and presence of a splint under differentreaction conditions.

In a first set of conditions, ligation experiments were conducted withincubation for seven to eight days. These experiments were performed inthe same ligation buffer as above with 50 μM of each oligonucleotide andincubated for a week at room temperature. FIG. 12D shows LC-MS analysisof the ligation of CFL and CCy5 in the absence (top) and presence(bottom) of a splint (positive control), where ligation reactions wereincubated for seven days. Three LC traces were recorded for eachreaction at 260 nm (to detect all nucleic acids), at 495 nm (to detectthe CFL oligonucleotide and the ligation product), and at 650 nm (todetect the CCy5 oligonucleotide and the ligation product).

In the absence of the splint, no ligation occurred, and only startingmaterials CFL (4339 Da) and CCy5 (7329 Da) were detected (FIG. 12D,top). When the splint oligonucleotide was present for seven days, acharacteristic peak was observed in 495 nm channel with a retention timeof 2.98 min, which corresponds to the ligated product (11542 Da) (FIG.12D, bottom). This peak overlapped with that for the CCy5oligonucleotide observed at the 650 nm channel and, thus, wasindistinguishable from CCy5 at 650 nm.

FIG. 12E shows the LC-MS analysis of CFL and CCy5 in the absence of asplint, where ligation reactions were incubated for eight days at 400 μMof each oligonucleotide. No ligation product was detected. Peak 1 (at495 nm) contained CFL starting material (4339 Da), as well as traces ofthe loss of iodine product (4211 Da) and an unknown degradation product(4271 Da, possibly ethyl mercaptane displacement). Peak 2 (at 650 nm)contained CCy5 starting material (7329 Da) and oxidized CCy5oligonucleotide (7317 Da). Peak 3 (at 650 nm) contained dimerized CCy5(14663 Da).

In a second set of conditions, iodine displacement reactions wereconducted in the presence of piperdine and at a pH higher than 7.0. FIG.12F shows MS analysis for a reaction of CFL oligonucleotide withpiperidine, where this reaction was intended to displace the terminaliodine present in CFL. One reaction condition included oligonucleotidesat 100 μM, piperidine at 40 mM (400 equivalents) in 100 mM boratebuffer, pH 9.5, for 20 hrs at room temperature (data shown in left panelof FIG. 12F); and another reaction condition included oligonucleotidesat 400 μM, piperidine at 2 M (4,000 equivalents) in 200 mM boratebuffer, pH 9.5, for 2 hrs at 65° C. (data shown in right panel of FIG.12F).

In the reaction condition including 40 mM of piperidine (FIG. 12F,left), no piperidine displacement was observed, and a small amount ofhydrolysis product was detected (4229 Da). In addition, traces of theloss of iodine (4211 Da) and unknown degradation product (4271 Da) wereobserved. In the reaction condition including 2 M of piperidine (FIG.12F, right), piperidine displacement of iodine was observed (4296 Da),and the amount of starting material was substantially diminished (4339Da). In addition, peaks corresponding to hydrolysis of iodine (bydisplacement of OH) or impurity (4229 Da) and loss of iodine (4214 Da)were also observed. These data show that the presence of an amine (e.g.,as part of chemical library synthesis) will not detrimentally effect theoligonucleotide portion of the library members and/or interfere withthis ligation strategy.

In a third set of conditions, splint ligation reactions were conductedin the presence of piperdine and at a pH higher than 7.0. FIG. 12G showsa splint ligation reaction of CFL and CCy5 oligonucleotides at 50 μMperformed in the presence of 400 equivalents of piperidine in 100 mMborate buffer, pH 9.5, for 20 hrs at room temperature. Thecharacteristic peak detected in the LC trace (at 495 nm) containedpredominantly the product of ligation at 11541.3 Da (calc'd 11540 Da).Based on these results, it can be concluded that that piperidine doesnot impair enzymatic ligation and that the presence of other amines(e.g., as part of chemical library synthesis) will likely not interferewith this ligation strategy.

Taking together, these data indicate that this ligation strategy can beperformed under various reaction conditions that are suitable for abroad range of chemical transformations, including extended incubationtimes, elevated pH conditions, and/or presence of one or more amines.Thus, the present methods can be useful for developing library memberswith diverse reaction conditions and precluding the necessity of bufferexchange, such as precipitation or other resource-intensive methods.

Example 7 Minimization of Shuffling with Modified Nucleotides

During single-stranded enzymatic ligation with T4 RNA ligase, low tomoderate extent of terminal nucleotide shuffling can occur. Shufflingcan result in the inclusion or excision of a nucleotide, where the finalproduct or complex includes or excludes a nucleotide compared to theexpected ligated sequence (i.e., a sequence having the complete sequencefor both the acceptor and donor oligonucleotides).

Though low levels of shuffling can be tolerated, shuffling can beminimized by including a modified phosphate group. In particular, themodified phosphate group is a phosphorothioate linkage between theterminal nucleotide at the 3′-terminus of an acceptor oligonucleotideand the nucleotide adjacent to the terminal nucleotide. By using such aphosphorothioate linkage, shuffling was greatly reduced. Only residualshuffling was detected by mass spectrometry, where shuffling likelyarose due to incomplete conversion of the native phosphodiester linkageinto the phosphorothioate linkage or to low levels of oxidation of thephosphorothioate linkage followed by conversion into the nativephosphodiester linkage. Taking together this data and the ligation datain Example 6, one or more modified phosphate groups (e.g., aphosphorothioate or a 5′-N-phosphoramidite linkage) could be included inany oligonucleotide sequence described herein (e.g., between theterminal nucleotide at the 3′-terminus of a headpiece, a complex, abuilding block tag, or any tag described herein, and the nucleotideadjacent to the terminal nucleotide) to minimize shuffling duringsingle-stranded ligation.

A single stranded headpiece (ssHP, 3636 Da) was phosphorylated at the5′-terminus and modified with a hexylamine linker at the 3′-terminus toprovide the sequence of 5′-P-mCGAGTCACGTC/Aminohex/-3′ (SEQ ID NO: 29).The headpiece was ligated to a tag (tag 15, XTAGSS000015, 2469 Da)having the sequence of 5′-mCAGTGTCmA-3′ (SEQ ID NO: 30), where mC and mAindicate 2′-O methyl nucleotides. LC-MS analysis (FIG. 13A) revealedthat the ligation product peak contained up to three species, which waspartially separated by LC and had the following molecular weights: 6089Da (expected), 5769 Da (−320 Da from expected) and 6409 Da (+320 Da fromexpected). This mass difference of 320 Da corresponds exactly to eitherremoval or addition of an extra O-Me C nucleotide (“terminal nucleotideshuffling”).

Experiments with other terminal O-Me nucleotides, as well as terminal2′-fluoro nucleotides, confirmed that shuffling likely occurs bycleavage of the 5′-terminal nucleotide of the donor oligonucleotide,probably after adenylation of the latter. The mechanism of this event isunknown. Without being limited by mechanism, FIG. 13B illustrates apossible scheme for nucleotide reshuffling during T4 RNA ligase reactionbetween a headpiece and a tag, where one of skill in the art wouldunderstand that this reaction could occur between any donor and acceptoroligonucleotides (e.g., between two tags, where one tag is the donoroligonucleotide and the other tag is the acceptor oligonucleotide).

Generally, the majority of the ligation reaction with T4 RNA ligase(T4Rnl1) provides the expected (normal) ligation product having thecombined sequence of both the donor and acceptor oligonucleotides (FIG.13B-1, reaction on left). A small minority of the reaction providesaberrant ligation products (FIG. 13B-1, reaction on right), where theseaberrant products include those having the removal or addition of aterminal nucleotide (“Product−1 nt” and “Product+1 nt,” respectively, inFIG. 13B-2).

Without being limited by mechanism, cleavage of the donoroligonucleotide (“headpiece” or “HP” in FIG. 13B-1) may occur byreacting with the 3′-OH group of the acceptor (“tag”), thereby providinga 5′-phosphorylated donor lacking one nucleotide (“HP-1 nt”) and anadenylated nucleotide with an accessible 3′-OH group (“1 nt”). FIG.13B-2 shows two exemplary schemes for the reaction between the headpiece(HP), tag, HP-1 nt, and 1 nt. To provide a product with an excisedterminal nucleotide (FIG. 13B-2, left), the 5′-phosphorylated donorlacking one nucleotide (HP-1 nt) acts a substrate for the ligationevent. This HP-1 nt headpiece is re-adenylated by T4 RNA ligase (toprovide “Adenylated HP-1 nt” in FIG. 13B-2) and ligated to the tag,resulting in a ligation product minus one nucleotide (“Product−1 nt”).To provide a product with an additional terminal nucleotide (FIG. 13B-2,left), the adenylated nucleotide (1 nt) likely serves as a substrate forligation to the tag, thereby producing an oligonucleotide having onenucleotide longer than the acceptor (“Tag+1 nt”). This Tag+1 ntoligonucleotide likely serves as an acceptor for the unalteredheadpiece, where this reaction provides a ligation product having anadditional nucleotide (“Product+1 nt”). LC-MS analyses of “Product”,“Product−1 nt”, and “Product+1 nt” were performed (FIG. 13B-3). When anaberrant tag and an aberrant headpiece (i.e., Tag+1 nt and HP-1 nt,respectively) recombine, then the resultant ligation product isindistinguishable from the expected product.

To further study the mechanism of terminal nucleotide reshuffling, aheadpiece (HP-PS) having the sequence of 5′P-mC*GAGTCACGTC/Aminohex/-3′(SEQ ID NO: 31) was prepared. Headpiece HP-PS has the same sequence asssHP but contains one modification, namely the first phosphodiesterlinkage between 5′-terminal nucleotide mC and the following G wassynthesized as a phosphorothioate linkage (one non-bridging phosphateoxygen was substituted by a sulfur). LC-MS analysis of the HP-PSligation to tag 15 revealed that shuffling was almost completelyinhibited (FIG. 13C). Traces of +/−320 peaks likely correspond to theoxidative conversion of the phosphorothioate linkage into nativephosphodiester linkages or incomplete sulfurization.

Example 8 Size Exclusion Chromatography of Library Members

Libraries of chemical entities that are generated using short,single-stranded oligonucleotides as encoding elements are well suitedfor the enrichment of binders via size exclusion chromatography (SEC).SEC is chromatographic technique that separates molecules on the basisof size, where larger molecules having higher molecular weight flowthrough the column faster than smaller molecules having lower molecularweight.

Complexes of proteins and ssDNA library members can be readily separatedfrom unbound library members using SEC. FIG. 14 is an ultraviolet tracefrom an SEC experiment in which a small molecule covalently attached toshort ssDNA (a range of oligonucleotides with defined lengths in the20-50 mer range) was mixed with a protein target known to bind the smallmolecule. The peaks that elute first from the column, in the 11-13minute time range, represent target-associated library members. Thelater peaks, eluting from 14-17 minutes, represent unbound librarymembers. The ratio of protein target to library molecule was 2:1, soapproximately 50% of the library molecules should associate with theprotein in the early eluting fraction, as observed in FIG. 14. Librarieswith larger, double-stranded oligonucleotide coding regions cannot beselected using this method since the unbound library members co-migratewith the bound library members on SEC. Thus, small molecule librariesattached to encoding single-stranded oligonucleotides in the 20-50merlength range enable the use of a powerful separation technique that hasthe potential to significantly increase the signal-to-noise ratiorequired for the effective selection of small molecule binders to one ormore targets, e.g., novel protein targets that are optionally untaggedand/or wild-type protein. In particular, these approaches allow foridentifying target-binding chemical entities in encodedcombinatorially-generated libraries without the need for tagging orimmobilizing the target (e.g., a protein target).

Example 9 Encoding with Chemically Ligated DNA Tags Using the SameChemistry for Each Ligation Step

Encoding DNA tags can be ligated enzymatically or chemically. A generalapproach to chemical DNA tag ligation is illustrated in FIG. 15A. Eachtag bears co-complementary reactive groups on its 5′ and 3′ ends. Inorder to prevent polymerization or cyclization of the tags, either (i)protection of one or both reactive groups (FIG. 15A), e.g., in case ofTIPS-protected 3′ alkynes, or (ii) splint-dependent ligation chemistry(FIG. 15B), e.g., in the case of 5′-iodo/3′-phosphorothioate ligation,is used. For (i), unligated tags can be removed or capped after eachlibrary cycle to prevent mistagging or polymerization of the deprotectedtag. This step may be optional for (ii), but may still be included.Primer extension reactions, using polymerase enzymes that are capable ofreading through chemically ligated junctions, can also be performed todemonstrate that ligated tags are readable and therefore the encodedinformation is recoverable by post-selection amplification andsequencing (FIG. 15C).

A library tagging strategy that implements ligation of the tags using“click-chemistry” (Cu(I) catalyzed azide/alkyne cycloaddition) is shownin FIG. 16A. The implementation of this strategy relies on the abilityof precise successive ligation of the tags, avoiding mistagging, and tagpolymerizations, as well as the ability to copy the chemically ligatedDNA into amplifiable natural DNA (cDNA) for post-selection amplificationand sequencing (FIG. 16C).

To achieve accurate tag ligation triisopropylsilyl (TIPS)-protected 3′propargyl nucleotides, (synthesized from propargyl U in the form of aCPG matrix used for oligonucleotide synthesis) was used (FIG. 16B). TheTIPS protecting group can be specifically removed by treatment withtetrabutylammonium fluoride (TBAF) in DMF at 60° C. for 1-4 hours. As aresult, the ligation during library synthesis includes a5′-azido/3′-TIPS-propargyl nucleotide (Tag A) reacting with the3′-propargyl of the headpiece through a click reaction. Afterpurification, the previous cycle is treated with TBAF to remove TIPS andgenerate the reactive alkyne which in turn reacts with the next cycletag. The procedure is repeated for as many cycles as it is necessary toproduce 2, 3 or 4 or more successively installed encoding tags (FIG.16A).

Materials and Methods

Oligos: The following oligos were synthesized by TrilinkBiotechnologies, San Diego Calif.: ss-HP-alkyne: 5′-NH₂-TCG AAT GAC TCCGAT AT (3′-Propargyl G)-3′(SEQ ID NO: 32); ss-azido-TP: 5′-azido dT ATAGCG CGA TAT ACA CAC TGG CGA GCT TGC GTA CTG-3′(SEQ ID NO: 33); andB-azido: 5′ azido dT ACA CAC TGG CGA GCT TGC GTA CTG-3′ (SEQ ID NO: 34).

ClickTag-TIPS: 5′-azdido dT AT GCG TAC AGT CC (propargyl U-TIPS)-3′ (SEQID NO: 35) and 5′Dimethoxytrityl 2′-succinyl 3′-O-(triisopropyl silyl)Propargyl uridine cpg were synthesized by Prime Organics, Woburn Mass.

The following oligos were synthesized by IDT DNA technologies,Coralville, Iowa: FAM-click-primer: (5′-6-FAM) CAG TAC GCA AGC TCG CC-3′(SEQ ID NO: 36) and Cy5-click-primer: (5′-Cy5) CAG TAC GCA AGC TCG CC-3′(SEQ ID NO: 37).

DNA55-control: /5′Biotin-TEG//ispC3//ispC3/-TCGAATGACTCCGATATGT ATA GCGCGA TAT ACA CAC TGG CGA GCT TGC GTA CTG-3′ (SEQ ID NO: 38).

rDNA55-control: /5Bio-TEG//ispC3//ispC3/-TCGAATGACTCCGATAT(riboG)T ATAGCG CGA TAT ACA CAC TGG CGA GCT TGC GTA CTG-3′ (SEQ ID NO: 39)

Synthesis of the templates: In the following examples, the phrase“chemically ligated tags”, or control sequences related to them, arereferred to as “templates” because the subsequent step (“reading”)utilizes them as templates for template-dependent polymerization.

Tag ligation: To a solution of 1 equivalent (1 mM) of ssHP-alkyne and 1equivalent (1 mM) of ss-azidoTP in 500 mM pH 7.0 phosphate buffer, wasadded a solution of pre-mixed 2 eq of Cu(II)Acetate (to a finalconcentration of 2 mM), 4 eq of sodium ascorbate (to a finalconcentration of 4 mM), 1 eq TBTA (to a final concentration of 1 mM) inDMF/water. The mixture was incubated at room temperature overnight.After LC-MS confirmation of the completion of the reaction, the reactionwas precipitated using salt/ethanol.

“Single click” templates Y55 and Y185 were synthesized by the reactionof ss-HP-alkyne with ss-azido-TP and B-azido, respectively. Double andtriple click templates (YDC and YTC) were synthesized by click ligationof ss-HP-alkyne with ClickTag-TIPS, followed by deprotection of TIPSusing TBAF (tetrabutylammonium fluoride) in DMF at 60° C. for an hour,followed by click ligation with ss-azido TP. For triple click template(YTC), the ligation and deprotection of ClickTag-TIPS was repeatedtwice.

The templates were reacted with biotin-(EG)₄-NHS and desalted (FIG.17A). The final products were purified by RP HPLC and/or on a 15-20%polyacryl amide gel/8M urea and analyzed by LC-MS.

Enzymes: The following DNA polymerases with their reaction buffers werepurchased from New England Biolabs: Klenow fragment of E. coli DNApolymerase I, Klenow fragment (exo-), E. coli DNA polymerase I,Therminator™, 9° N™, Superscript III™.

Streptavidin magnetic Dynabeads® M280 were purchased from Invitrogen.

Template-dependent polymerization assessment: Each template (5 μM) wasincubated with 1 equivalent of either Cy5 or FAM Click-primer in 40 to50 μL of the corresponding 1× reaction buffer and each enzyme, usingreaction conditions according to the manufacturer's guidelines for 1hour. Certain reactions (such as SSII or SSIII transcriptions) wereadditionally supplemented with 1 mM MnCl₂. The product of the reactionwas loaded on 125 μL of pre-washed SA beads for 30 minutes with shaking.The beads were then collected, and the flowthrough was discarded. Beadswere washed with 1 mL of Tris-buffered saline (pH 7.0) and eluted with35 μL of 100 mM NaOH. The eluate was immediately neutralized by adding10 μL of 1 M Tris HCl, pH 7.0. The products were analyzed using LC-MS.

Results and Discussion

Template Preparation: Each template, Y55, Y185 (FIGS. 17B and 17C), YDCand YTC (FIG. 19) was synthesized and purified to greater than 85%purity (the major impurity being un-biotinylated template). LC-MSrevealed the following MWs for the templates: Y55 17,624 (calculated17,619) Da; YDC 22,228 (calculated 22,228) Da; and YTC 26,832(calculated 26,837) Da.

The single click templates Y55 and Y185 (FIGS. 17B and 17C) weresynthesized from oligonucleotides that bear only one click chemistryfunctionality (alkyne or azide). The efficiency of the click reaction(chemical ligation) was over 90% in an overnight reaction using Cu(I)catalyst generated in situ.

Templates YDC and YTC (FIGS. 19A-19D) serve to demonstrate successivechemical ligations. Both YDC and YTC use individual tags whichsimultaneously contain both azido and TIPS-protected alkynefunctionalities. Template YTC demonstrates three successive cycles oftagging as may be used to encode three steps of chemical librarygeneration.

All of the above templates were tested for primer extension through andbeyond the click-ligation linkages to demonstrate that ligated tags arereadable, and therefore that encoded information is recoverable.

Template-dependent polymerization using “single-click” template Y55: Alarge set of polymerases was tested to read through a triazole clicklinkage (FIG. 18A). Initial experiments were performed usingCy5-click-primer. In later experiments FAM-click-primer was used. Thefluorophore had no effect on the copying of the template, i.e., theresults were equivalent using either primer. As a control templateDNA55-control and rDNA55-control were used (to test the effect of asingle ribonucleotide in the template, since propargyl-G used for aclick ligation is a ribonucleotide derivative).

Expected full length products in all three templates have the samemolecular weight, which is 17446 (FAM primer) (FIG. 18B) or 17443 (Cy5primer). A small amount of the product which corresponds to primerextension up to, but stopping at, the click ligation linkage (11880 Da)was also observed for some polymerases.

A set of polymerases that can produce substantial degree of read-throughof the click linkage (production of full-length cDNA) were discoveredand are tabulated below.

Full-length cDNA yields of over 50% Klenow fragment of E. coli DNApolymerase I Klenow fragment (exo-) E. coli DNA polymerase ITherminator ™ 9° N ™ Superscript III ™ supplemented with 1 mM MnCl₂

The highest yields (over 80% read-through at a single click junction)were achieved when using Klenow fragment with incubation at 37° C. (FIG.18B). Somewhat lower yield was observed using E. coli DNA polymerase I.50% yields with Therminator™ and 9° N™ polymerases, as well as Klenowfragment exo- were achieved.

Superscript III™ reverse transcriptase produced about 50% yield of cDNAwhen the buffer was supplemented with 1 mM MnCl₂. However, manganesecaused the mis-incorporation of nucleotides which was observed by MS,i.e., polymerization fidelity was reduced.

Template-dependent polymerization using “single-click” template Y185:Template Y185 features the same primer binding site as all templatesused in this example, except, due to a different tailpiece B-azido, thedistance between the last nucleotide of the primer binding site to theclick linkage is 8 nucleotides, as compared to 20 nucleotides in Y55 andall other templates. The template was used to test whether transcriptionof a click linkage was still possible when the enzyme was ininitiation-early elongation conformation. Klenow was capable of copyingthe Y185 template with similar efficiency to Y55, opening thepossibility of reducing the length of the click-ligated encoding tags(FIG. 18C).

Template-dependent polymerization using double and triple click-ligatedtemplates YDC and YTC: After establishing that the Klenow fragment wasthe most efficient enzyme to read through the click ligation linkagesunder the assay condition employed, cDNA using YDC and YTC templates(FIGS. 20A-20C) were also generated. Primer extension reactions withboth YDC and YTC templates produced full length products. Other observedproducts, which composed around 10-15% of total reaction output,corresponded to partially extended primer, stalled at each clickjunction, such as e.g., 11880 Da and 16236 Da. The yields were measuredby LC-MS analysis in the presence of the internal standard and wereabout 80-90% per junction (i.e., around 85% for 1 click, 55% for 2-clickand 50% for 3-click templates, see FIG. 21).

The product of YDC transcription lacked 1 dA nucleotide (calculated22110, observed 27197 Da; −313 dA FIG. 20B) and the product of YTCtranscription lacked 2 dA nucleotides (calculated 26773, observed 26147;−626 2xdA) (FIG. 20C). This correlates with the number of propargyl Unucleotides in the template. Without wishing to be limited by mechanism,it can be hypothesized that Klenow skipped over those U's in the contextof T-triazole-U junction. In contrast, the propargyl G nucleotide in the1^(st) click junction was correctly copied.

Example 10 Use of 3′-Phosphorothioate/5′-Iodo Tags to Chemically Ligatea Succession of Encoding DNA Tags that Encode a Chemical LibraryCovalently Installed Upon the 5′-Terminus

Protection of 3 ‘-phosphorothioate on tag: As shown in FIG. 24A, a5’-iodo-3′-phosphorothioate tag (1 eq.) was dissolved in water to give afinal concentration of 5 mM. Subsequently, vinyl methyl sulfone (20 eq.)was added and the reaction was incubated at room temperature overnight.Upon completion of the reaction, the product was precipitated byethanol.

Library Synthesis (FIG. 24B)

Cycle A: To each well in the split was added single-stranded DNAheadpiece (1 eq., 1 mM solution in 500 mM pH 9.5 borate buffer), onecycle A protected tag (1.5 eq.), and splint (1.2 eq.). The chemicalligation was incubated at room temperature overnight. To each well (inthe split) was then added one Fmoc amino acid (100 eq.), followed by4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (100eq.). The chemical reaction was incubated at room temperature overnight.Upon completion, all wells were pooled and the products precipitatedusing ethanol. The cycle A pool was purified using LC and lyophilized todryness, and then dissolved in water to give a 1 mM final concentrationand piperidine (10% v/v) was added to perform the deprotection of cycleA tag (60° C., 2 h). The deprotected product was precipitated againusing ethanol.

Cycle B: The deprotected cycle A pool was dissolved in 500 mM, pH 9.5,borate buffer to give a 1 mM concentration and then split into separatereaction wells (1 eq. of cycle A product in each well). To each well wasadded one cycle B protected tag (1.5 eq.), and splint (1.2 eq.). Thechemical ligation was incubated at room temperature overnight. To eachwell (in the split) was added a mixture of one formyl acid (100 eq.),diisopropyl carbodiimide (100 eq.) and 1-hydroxy-7-aza-benzotriazole(100 eq.). The chemical reaction was incubated at room temperatureovernight. Upon completion, all wells were pooled and the productsprecipitated using ethanol. The cycle B pool was purified using LC andlyophilized to dryness, and then dissolved in water to give a 1 mM finalconcentration and piperidine (10% v/v) was added to perform thedeprotection of cycle B tag (60° C., 2 h). The deprotected product wasprecipitated again using ethanol.

Cycle C: The deprotected cycle B pool was dissolved in 500 mM pH 5.5phosphate buffer to give a 1 mM concentration and then split intoseparate reaction wells (1 eq. of cycle B product in each well). To eachwell was added one cycle C tag (1.5 eq.) and splint (1.2 eq.). Thechemical ligation was incubated at room temperature overnight. To eachwell (in the split) was added an amine (80 eq.) and sodiumcyanoborohydride (80 eq.). The chemical reaction was incubated at 60° C.for 16 h. Upon completion, all wells were pooled and the productsprecipitated using ethanol. The cycle C pool was purified using LC andlyophilized to dryness.

Example 11 Encoding with Chemically Ligated DNA Tags Using a Pair ofOrthogonal Chemistries for Each Successive Tag Ligation Step

Another approach for generation of chemically ligated encoding DNA tagsis the use of a pair of orthogonal chemistries for successive ligations(FIG. 22A). Tags that bear orthogonal reactive groups at their ends willnot tag polymerize or cyclize, and the orthogonal nature of successiveligation steps will reduce the frequency of mistagging events. Suchapproaches require (i) having at least two orthogonal chemistriesavailable for oligonucleotide conjugation, and (ii) availableread-through strategy for each of the junctions thus created (FIGS. 22Band 22C). This approach may also obviate the need for the use ofprotection groups or capping steps, thereby simplifying the tag ligationprocess.

Orthogonal chemical ligation strategy utilizing 5 ‘-Azido/3’-Alkynyl and5′-Iodo/3′-Phosphorothioate ligation for successive steps: A_(n) exampleof the use of two orthogonal chemistries tag ligation is the combinationof 5′-azido/3′-alkynyl and 5′-iodo/3′-phosphorothioate ligations. FIG.23 shows an exemplary schematic of the synthesis of a 3-cycle orthogonalchemical ligation tagging strategy using these successive ligationchemistries. FIGS. 25A-25B show an example of the use of3′-phosphorothioate/5′-azido and 3′-propargyl/5′-iodo tags to chemicallyligate a succession of orthogonal encoding DNA tags that encode achemical library covalently installed upon the 5′-terminus.

Protection of 3 ‘-phosphorothioate on tags: As shown in FIG. 25A, a5’-azido-3′-phosphorothioate tag (1 eq.) was dissolved in water to givea final concentration of 5 mM. Subsequently, vinyl methyl sulfone (20eq.) was added and the reaction was incubated at room temperatureovernight. Upon completion of the reaction, the product was precipitatedby ethanol.

Library Synthesis (FIG. 25B)

Cycle A: To each well in the split was added single stranded DNAheadpiece (1 eq., 1 mM solution in 500 mM pH 9.5 borate buffer), onecycle A tag (1.5 eq.), and splint (1.2 eq.). The chemical ligation wasincubated at room temperature overnight. To each well (in the split) wasthen added one Fmoc amino acid (100 eq.), followed by4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (100eq.). The chemical reaction was incubated at room temperature overnight.Upon completion, all wells were pooled and the products precipitatedusing ethanol. The cycle A pool was purified using LC and lyophilized todryness. Fmoc deprotection was performed on cycle A pool by treating thepool (1 mM in water) with piperidine (10% v/v) for 2 h at roomtemperature. The deprotected product was precipitated again usingethanol.

Cycle B: The purified cycle A pool was dissolved in 500 mM, pH 7.0phosphate buffer to give a 1 mM concentration and then split intoseparate reaction wells (1 eq. of cycle A product in each well). To eachwell was added one cycle B protected tag (1.2 eq.), copper (II) acetate(2 eq.), sodium ascorbate (4 eq.), and tris-(benzyltriazolylmethyl)amine(1 eq.). The chemical ligation was incubated at room temperatureovernight. Upon completion, the products were precipitated (in thesplit) using ethanol and then diluted to a 1 mM concentration using 500mM, pH 9.5 borate buffer. To each well (in the split) was then added amixture of one formyl acid (100 eq.), diisopropyl carbodiimide (100eq.), and 1-hydroxy-7-aza-benzotriazole (100 eq.). The chemical reactionwas incubated at room temperature overnight. Upon completion, all wellswere pooled and the products precipitated using ethanol. The cycle Bpool was then dissolved in water to give a 1 mM final concentration, andpiperidine (10% v/v) was added to perform the deprotection of cycle Btag (room temperature, 18 h). The deprotected product was precipitatedagain using ethanol. The deprotected Cycle B pool was purified using LCand lyophilized to dryness.

Cycle C: The purified cycle B pool was dissolved in 500 mM, pH 5.5phosphate buffer to give a 1 mM concentration and then split intoseparate reaction wells (1 eq. of cycle B product in each well). To eachwell was added one cycle C tag (1.5 eq.) and splint (1.2 eq.). Thechemical ligation was incubated at room temperature overnight. To eachwell (in the split) was added an amine (80 eq.) and sodiumcyanoborohydride (80 eq.). The chemical reaction was incubated at 60° C.for 16 h. Upon completion, all wells were pooled and the productsprecipitated using ethanol. The cycle C pool was purified using LC andlyophilized to dryness.

OTHER EMBODIMENTS

All publications, patent applications, and patents mentioned in thisspecification are herein incorporated by reference.

Various modifications and variations of the described method and systemof the invention will be apparent to those skilled in the art withoutdeparting from the scope and spirit of the invention. Although theinvention has been described in connection with specific desiredembodiments, it should be understood that the invention as claimedshould not be unduly limited to such specific embodiments. Indeed,various modifications of the described modes for carrying out theinvention that are obvious to those skilled in the fields of medicine,pharmacology, or related fields are intended to be within the scope ofthe invention.

1. A method of tagging a first library comprising anoligonucleotide-encoded chemical entity, said method comprising: (i)providing a single-stranded oligonucleotide headpiece having a firstfunctional group and a second functional group; (ii) binding said firstfunctional group of said headpiece to a first component of said chemicalentity, wherein said headpiece is directly connected to said firstcomponent or said headpiece is indirectly connected to said firstcomponent by a bifunctional linker; and (iii) ligating said secondfunctional group of said headpiece to a first building block tag to forma complex, wherein said ligating comprises chemical ligation of one ormore chemically co-reactive pairs selected from: (a) an optionallysubstituted alkyne and an optionally substituted azido group; or (b) aphosphorothioate group and an iodo group; wherein said steps (ii) and(iii) can be performed in any order; and wherein said first buildingblock tag encodes for the binding reaction of said step (ii), therebyproviding a tagged library. 2-75. (canceled)
 76. The method of claim 1,wherein said chemically co-reactive pair is an optionally substitutedalkynyl group and an optionally substituted azido group.
 77. The methodof claim 1, wherein said chemically co-reactive pair is aphosphorothioate group and an iodo group.
 78. The method of claim 77,wherein said phosphorothioate group is at the 5′-terminus of anoligonucleotide and said iodo group is at the 3′-terminus of anoligonucleotide. 79-94. (canceled)
 95. The method of claim 77, whereinsaid phosphorothioate group is at the 3′-terminus of an oligonucleotideand said iodo group is at the 5′-terminus of an oligonucleotide.
 96. Themethod of claim 77, wherein said second functional group of saidheadpiece is a phosphorothioate group.
 97. The method of claim 76,wherein said azido group is at the 5′-terminus of an oligonucleotide andsaid alkyne group is at the 3′-terminus of an oligonucleotide.
 98. Themethod of claim 76, wherein said azido group is at the 3′-terminus of anoligonucleotide and said alkyne group is at the 5′-terminus of anoligonucleotide.
 99. The method of claim 76, wherein said secondfunctional group of said headpiece is an alkyne group.
 100. The methodof claim 1, wherein said chemical ligation further comprises a splintoligonucleotide in the chemical ligation reaction between saidchemically co-reactive pair.
 101. The method of claim 1, wherein saidchemically co-reactive pair produces a spacer having a length from about4 to about 24 atoms.
 102. The method of claim 1, wherein the methodfurther comprises: (iv) binding a single-stranded oligonucleotide secondbuilding block tag to the 5′-terminus or 3′-terminus of said complex;and (v) binding a second component of said chemical library to saidfirst component, wherein said steps (iv) and (v) can be performed in anyorder; wherein the second building block tag encodes for the bindingreaction of step (v); and wherein step (iv) is carried out usingchemical ligation comprising use of one or more chemically co-reactivepairs selected from: (a) an optionally substituted alkynyl group and anoptionally substituted azido group; or (b) a phosphorothioate group andan iodo group.
 103. The method of claim 102, wherein said chemicalligation further comprises a splint oligonucleotide in the bindingreaction between said chemically co-reactive pair.
 104. The method ofclaim 102, wherein said chemical ligation of said first building blocktag and said chemical ligation of said second building block tagcomprise orthogonal chemically co-reactive pairs for ligating successivebuilding block tags, wherein said orthogonal chemically co-reactivepairs comprises (a) an optionally substituted alkynyl group and anoptionally substituted azido group; and (b) a phosphorothioate group andan iodo group.
 105. The method of claim 102, wherein: (a) said methodfurther comprises separating said complex from any unreacted tag orunreacted headpiece before any one of binding steps (ii)-(v); (b) saidmethod further comprises purifying said complex before any one ofbinding steps (ii)-(v); and/or (c) said method further comprises bindingone or more additional building block tags to said complex and bindingone or more additional components to said complex.
 106. The method ofclaim 105, wherein said method further comprises binding one or moreadditional building block tags to said complex and binding one or moreadditional components to said complex, wherein said binding one or moreadditional building block tags comprises chemical ligation of one ormore additional building block tags using one or more chemicallyco-reactive pairs for ligating successive building block tags.
 107. Themethod of claim 106, wherein said chemical ligation of one or moreadditional building block tags comprises orthogonal chemicallyco-reactive pairs for ligating successive building block tags, whereinsaid orthogonal chemically co-reactive pairs comprises (a) an optionallysubstituted alkynyl group and an optionally substituted azido group; and(b) a phosphorothioate group and an iodo group.
 108. The method of claim102, wherein said complex, said headpiece, said first building blocktag, said second building block tag, and/or said one or more additionalbuilding block tags, if present, comprises a modified phosphate groupbetween the terminal nucleotide at the 3′-terminus and the nucleotideadjacent to said terminal nucleotide.
 109. The method of claim 1,wherein said headpiece comprises a hairpin structure.
 110. The method ofclaim 1, wherein: (a) said headpiece, said first building block tag,said second building block tag, and/or said one or more additionalbuilding block tags, if present, comprises from 5 to 20 nucleotides; (b)said headpiece, said first building block tag, said second buildingblock tag, and/or said one or more additional building block tags, ifpresent, further comprises a first library-identifying sequence; (c)said method further comprises binding a first library-identifying tag tosaid complex; (d) said headpiece, said first building block tag, saidsecond building block tag, and/or said one or more additional buildingblock tags, if present, further comprises a use sequence and/or anorigin sequence; (e) said method further comprises binding a use tagand/or an origin tag to said complex; and/or (f) said method furthercomprises binding a tailpiece to said complex.